巴西专利BR112020006985A2 video encoding with spatially variable quantization adaptable to content

专利PDF首页>>巴西专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
a video encoder can be configured to apply a multi-stage quantization process, where residuals are first quantized using an effective quantization parameter derived from the block sample statistics. the residual is then further quantized using a base quantization parameter that is uniform across an image. a video decoder can be configured to decode the video data using the base quantization parameter. the video decoder can be further configured to estimate the effective quantization parameter from the statistic of the decoded samples of the block. the video decoder can then use the estimated effective quantization parameter for use in determining parameters for other encoding tools, including filters.
公开号:BR112020006985A2
申请号:R112020006985-0
申请日:2018-10-10
公开日:2020-10-06
发明作者:Dmytro Rusanovskyy；Adarsh Krishnan Ramasubramonian
申请人:Qualcomm Incorporated；
IPC主号:

专利说明:

[0001] [0001] This application claims the benefit of US Provisional Patent Application No. 62/571,732 filed on Thursday, October 12, 2017 and claims priority from US Application 16/155,344 filed on Tuesday, October 9, 2018 , the entire contents of which are incorporated herein by reference. FIELD OF TECHNIQUE
[0002] [0002] This disclosure pertains to video encoding and/or video processing. BACKGROUND
[0003] [0003] Digital video capabilities can be built into a wide range of devices, including digital televisions, digital direct broadcast systems, wireless broadcast systems, personal digital assistants (PDAs), laptop or desktop computers, tablet computers, readers e-books, digital cameras, recording devices, digital media players, video game devices, video game consoles, cell phones or satellite radio phones, so-called “smartphones”, video teleconferencing devices, video streaming devices and similar. Digital video devices implement video encoding techniques such as those described in the standards defined by MPEG-2, MPEG-4, ITU-T H.263, ITU-T H.264/MPEG-4, Part 10, Video Encoding. Advanced Video (AVC), ITU-T H.265, High Efficiency Video Coding (HEVC), and extensions of such standards. video devices can transmit, receive, encode,
[0004] [0004] Video encoding techniques include spatial prediction (intra-picture) and/or temporal prediction (inter-picture) to reduce or remove inherent redundancy in video sequences. For block-based video encoding, a video slice (e.g. a video frame or a portion of a video frame) can be partitioned into video blocks, which can also be called tree blocks, encoding units (CUs) and/or encoding nodes. The video blocks in an intracoded slice (1) of an image are encoded using spatial prediction with respect to reference samples in neighboring blocks in the same image. Video blocks in an intercoded image slice (P or B) can use spatial prediction against reference samples in neighboring blocks in the same image or temporal prediction against reference samples in other reference images. Images can be called frames, and reference images can be called reference frames.
[0005] [0005] Spatial or temporal prediction results in a predictive block for a block that will be encoded. Residual data represents pixel differences between the original block that will be encoded and the predictive block. An intercoded block is coded according to a motion vector that points to a block of reference samples that form the predictive block, and the residual data that indicates the difference between the coded block and the predictive block. An intracoded block is coded according to an intracoding mode and the residual data. For further compression, the residual data can be transformed from the pixel domain into a transform domain, resulting in residual transform coefficients, which can then be quantized. The quantized transform coefficients, initially arranged in a two-dimensional array, can be swept to produce a one-dimensional vector of transform coefficients, and entropy coding can be applied to perform even more compression.
[0006] [0006] The total number of color values that can be captured, encoded and displayed can be defined by a color gamut. A color gamut refers to the range of colors that a device can capture (eg a camera) or reproduce (eg a screen) Color gamuts often differ from device to device. For video encoding, a predefined color gamut for video data can be used so that each device in the video encoding process can be configured to process pixel values in the same color gamut. Some color gamuts are defined with a wider color range than the color gamuts that have traditionally been used for video encoding. Such color gamuts with a larger color gamut can be called a wide color gamut (WCG).
[0007] [0007] Another aspect of video data is dynamic range. Dynamic range is typically defined as the ratio between the minimum and maximum brightness (eg luminance) of a video signal. The dynamic range of previously used common video data is considered to have a standard dynamic range (SDR). Other examples of video data specifications define color data that have a higher ratio of minimum to maximum brightness. Such video data can be described as having a high dynamic range (FIDR). SUMMARY
[0008] [0008] This disclosure describes examples of processing methods (and devices configured to perform the methods) applied in the encoding loop (eg encoding or decoding) of a video encoding system. The techniques of this disclosure are applicable for encoding representations of video data with only perceptible perceived difference not uniformly distributed (eg, signal-to-noise ratio) of the video data over its dynamic range. A video encoder can be configured to apply a multi-stage quantization process, where residuals are first quantized using an effective quantization parameter derived from the block sample statistics. The residual is then further quantized using a base quantization parameter that is uniform across an image. A video decoder can be configured to decode the video data using the base quantization parameter. The video decoder can be further configured to estimate the effective quantization parameter from the statistic of the decoded samples of the block. The video decoder can then use the estimated effective quantization parameter for use in determining parameters for other encoding tools, including filters. In this way, signaling overhead is prevented as the effective quantization parameter is not signaled but is estimated on the decoder side.
[0009] [0009] In one example, this disclosure describes a method of decoding video data, the method comprising receiving an encoded block of video data, wherein the encoded block of video data has been encoded using an effective quantization parameter. and a base quantization parameter, wherein the effective quantization parameter is a function of a quantization parameter offset plus the base quantization parameter, determining the base quantization parameter used to encode the encoded block of the video data , decoding the encoded block of video data using the base quantization parameter to create a decoded block of video data, determining an estimate of the quantization parameter offset for the decoded block of video data based on the statistic associated with the block decoded from the video data, add the quantization parameter offset estimate to the quantization parameter of ba if to create an estimate of the effective quantization parameter, and perform one or more filtering operations on the decoded block of video data as a function of the estimate of the effective quantization parameter.
[0010] [0010] In another example, this disclosure describes a method of encoding video data, the method comprising determining a base quantization parameter for a block of the video data,
[0011] [0011] In another example, this disclosure describes an apparatus configured to decode video data, the apparatus comprising a memory configured to store an encoded block of video data, and one or more processors in communication with the memory, the one or more processors configured to receive the encoded block of video data, wherein the encoded block of video data has been encoded using an effective quantization parameter and a base quantization parameter, where the effective quantization parameter is a function of a quantization parameter offset added to the base quantization parameter, determine the base quantization parameter used to encode the encoded block of video data, decode the encoded block of video data using the base quantize parameter to create a decoded block of video data, determine an estimate of the quantization parameter offset for the decoded block quantization of the video data based on the statistic associated with the decoded block of the video data, add the quantization parameter offset estimate to the base quantization parameter to create an estimate of the effective quantization parameter, and perform one or more quantization operations. filtering on the decoded block of video data as a function of the effective quantization parameter estimation.
[0012] [0012] In another example, this disclosure describes an apparatus configured to encode video data, the apparatus comprising a memory configured to store a block of video data, and one or more processors in communication with the memory, wherein o one or more processors are configured to determine a base quantization parameter for the block of video data, determine a quantization parameter offset for the block of video data based on the statistic associated with the block of video data, add shifting the quantization parameter to the base quantization parameter to create an effective quantization parameter, and encode the video data block using the effective quantization parameter and the base quantization parameter.
[0013] [0013] In another example, this disclosure describes an apparatus configured to decode video data, the apparatus comprising means for receiving an encoded block of video data, wherein the encoded block of video data has been encoded using a parameter quantization parameter and a base quantization parameter, wherein the effective quantization parameter is a function of a quantization parameter offset plus the base quantization parameter, means for determining the base quantization parameter used to encode the block encoded video data,
[0014] [0014] In another example, this disclosure describes an apparatus configured to encode video data, the apparatus comprising means for determining a base quantization parameter for a block of the video data, means for determining a parameter shift of quantization for the video data block based on the statistic associated with the video data block, means for adding the quantization parameter offset to the base quantization parameter to create an effective quantization parameter, and means for encoding the block of video data using the effective quantization parameter and the base quantization parameter.
[0015] [0015] In another example, this disclosure describes a non-temporary computer-readable storage medium that stores instructions that, when executed, cause one or more processors to receive the encoded block of video data, with the encoded block of video data was encoded using an effective quantization parameter and a base quantization parameter, where the effective quantization parameter is a function of a quantization parameter offset plus the base quantization parameter, determine the quantization parameter of base used to encode the encoded block of video data, decode the encoded block of video data using the base quantization parameter to create a decoded block of video data, determine an estimate of the quantization parameter offset for the decoded block of the video data based on the statistic associated with the decoded block of the video data, add the quantization parameter shift to the base quantization parameter to create an estimate of the effective quantization parameter, and performing one or more filtering operations on the decoded block of video data as a function of the estimate of the effective quantization parameter.
[0016] [0016] In another example, this disclosure describes a non-temporary computer-readable storage medium that stores instructions that when executed cause one or more processors to determine a base quantization parameter for a block of video data, determine a quantization parameter offset for the video data block based on the statistic associated with the video data block, add the quantization parameter offset to the base quantization parameter to create an effective quantization parameter, and encode the block of video data using the effective quantization parameter and the base quantization parameter.
[0017] [0017] Details of one or more examples are shown in the attached drawings and in the description below. Other features, objects, and advantages will become apparent from the description, drawings, and claims. BRIEF DESCRIPTION OF THE DRAWINGS
[0018] [0018] Figure 1 is a block diagram illustrating an example of a video encoding and decoding system configured to implement the disclosure techniques.
[0019] [0019] Figures 2A and 2B are conceptual diagrams illustrating an example of a quadtree binary tree (QTBT) structure, and a corresponding code tree unit (CTU).
[0020] [0020] Figure 3 is a conceptual drawing that illustrates the concepts of HDR data.
[0021] [0021] Figure 4 is a conceptual diagram illustrating examples of color gamuts.
[0022] [0022] Figure 5 is a flow diagram illustrating an example of HDR/WCG representation conversion.
[0023] [0023] Figure 6 is a flow diagram illustrating an example of inverse HDR/WCG conversion.
[0024] [0024] Figure 7 is a conceptual diagram illustrating examples of electro-optical transfer functions (EOTF) used for converting video data (including SDR and HDR) from perceptually uniform code levels to linear luminance.
[0025] [0025] Figure 8 is a block diagram illustrating an example of a video encoder that can implement the techniques of this disclosure.
[0026] [0026] Figure 9 is a block diagram illustrating an example of a quantization unit of a video encoder that can implement the techniques of this disclosure.
[0027] [0027] Figure 10 is a block diagram illustrating an example of a video decoder that can implement the techniques of this disclosure.
[0028] [0028] Figure 11 is a flowchart illustrating an example encoding method.
[0029] [0029] Figure 12 is a flowchart illustrating an example decoding method. DETAILED DESCRIPTION
[0030] [0030] This revelation is related to the processing and/or encoding of video data with high dynamic range (HDR) and wide color gamut (WCG) representations. More specifically, the techniques of this disclosure include content-adaptive spatially variable quantization without explicit signaling of quantization parameters (e.g., a change in a quantization parameter represented by a deltaQP syntax element) to efficiently compress HDR video signals. /WCG. The techniques and devices described in this document can improve the compression efficiency of video encoding systems used to encode HDR and WCG video data. The techniques of this revelation can be used in the context of advanced video codecs such as HEVC extensions or next generation video coding standards.
[0031] [0031] Video coding standards, including hybrid-based video coding standards include ITU-T H.261, ISO/IEC MPEG-1l1 Visual, ITU-T H.262 or ISO/IEC MPEG-2 Visual, ITU-T H.263, ISO/IEC MPEG-4 Visual and ITU-T H.264 (also known as ISO/IEC MPEG-4 AVC), including their Scalable Video Coding (SVC) and Multiview Video Coding extensions (MVC). The design of a new video coding standard, ie High Efficiency Video Coding (HEVC), also called H.265), was finalized by the Joint Collaboration Team on Video Coding (JCT-VC) of ITU-T Video Coding Experts Group (VCEG) and ISO/IEC Motion Picture Experts Group (MPEG). A draft HEVC specification called HEVC Working Draft 10 (WD 10), Bross et al., "High efficiency video coding (HEVC) text specification draft 10 (for FDIS & Last Call)," Joint Collaborative Team on Video Coding ( JCT-VC) of ITU-T SGI 6 WP3 and ISO/IEC JTC1/SC29/WGl11, 12th Meeting: Geneva, CH, January 14-23, 2013, JCTVC-L1003v34, is available at http://phenix.int - eyry.fr/ict/doc end user/documents/12 Geneva/wgll/JCTVC-LIOO03-v34.zip. The finalized HEVC standard is called HEVC version 1. The finalized HEVC standard document is published as ITU-T H.265, Series H: Audiovisual and Multimedia Systems, Infrastructure of audiovisual services - Coding of moving video, High efficiency video coding, Telecommunication Standardization Sector of International Telecommunications Union (ITU), April 2013, and another version of the finalized HEVC standard was published in October 2014. A copy of the H.265/HEVC specification text can be downloaded from http:// www.itu.int/rec/T-REC-H.265-201504-1/en.
[0032] [0032] ITU-T VCEG (Q6/16) and ISO/IEC MPEG (JTC 1/SC 29/WG 11) are now studying the potential need to standardize future video coding technology with a compression capability that exceeds the of the current HEVC standard, including its current and short-term extensions for screen content encoding and high dynamic range encoding. The groups are working together on this exploration activity in a joint collaborative effort known as the Joint Video Exploration Team (JVET) to evaluate compression technology projects proposed by their experts in this area. The JVET met for the first time between October 19 and 21, 2015. And the latest version of reference software i.e. Joint Exploration Model 7 (JEM7) could be downloaded from: https://jvet.hhi.fraunhofer. de/svn/svn HMIEMSoftware/tags/H M-16.6-JEM-7.0/. This algorithm description for JEM7 could be called J. Chen, E. Alshina, G. J. Sullivan, J.-R. Ohm, JJ. Boyce "Algorithm description of Joint Exploration Test Model 7 (JEM7)," JIVET-G1001, Turin, July 2017.
[0033] [0033] Recently, a new video encoding standard, called the Versatile Video Encoding (VVC) standard, is under development by the Joint Video Expert Team (JVET) of VCEG and MPEG. An initial VVC project is available in document JVET-4J1001
[0034] [0034] Figure 1 is a block diagram illustrating an example video encoding and decoding system 10 that can use the techniques of this disclosure. As shown in Figure 11, the system 10 includes a source device 12 that provides encoded video data which will be decoded at a later time by a destination device 14. In particular, the source device 12 provides the video data to the source device 14. destination 14 via a computer readable medium 16. The source device 12 and the destination device 14 may comprise any of a wide range of devices, including desktop computers, notebook computers (i.e. laptop), tablet computers, set-top boxes, telephone sets such as so-called “smart” phones, so-called “smart” pads, televisions, cameras, display devices, digital media players, game consoles, video streaming devices, or the like. In some cases, the source device 12 and the target device 14 may be equipped for wireless communication.
[0035] [0035] The destination device 14 can receive the encoded video data that will be decoded through the computer readable medium 16. The computer readable medium 16 may comprise any type of medium or device capable of moving the encoded video data to from source device 12 to target device 14. In one example, computer readable medium 16 may comprise communication means to enable source device 12 to transmit encoded video data directly to target device 14 in real time. The encoded video data may be modulated according to a communication standard, such as a wired or wireless communication protocol, and transmitted to the target device 14. The communication medium may comprise any wireless or wired communication medium. wire, such as a radio frequency (RF) spectrum or one or more physical transmission lines. The communication medium may be part of a packet-based network, such as a local area network, a wide area network, or a global network such as the Internet. The communication medium may include routers, switches, base stations, or any other equipment that may be useful in facilitating communication from the source device 12 to the destination device 14.
[0036] [0036] In other examples, the computer readable medium 16 may include non-temporary storage media, such as a hard disk, flash drive, compact disc, digital video disc, Blu-ray disc, or other computer readable media. In some examples, a network server (not shown) may receive encoded video data from source device 12 and provide the encoded video data to destination device 14, for example, via network transmission. Similarly, a computing device in a medium production facility, such as a disc stamping facility, can receive encoded video data from the source device.
[0037] [0037] In some examples, encoded data may be sent from output interface 22 to a storage device. Similarly, encoded data can be accessed from the storage device through the input interface. The storage device may include any of a variety of distributed or locally accessed data storage media such as a hard disk, Blu-ray Discs, DVDs, CD-ROMs, flash memory, volatile or non-volatile memory, or any other medium. of digital storage suitable for storing encoded video data. In a further example, the storage device can correspond to a file server or other intermediate storage device that can store the encoded video generated by the source device 12. The destination device 14 can access stored video data from the device. storage via streaming or download. The file server can be any type of server capable of storing encoded video data and transmitting encoded video data to the target device 14. Examples of file servers include a web server (eg for a website) , an FTP server, network attached storage (NAS) devices, or a local hard drive. The target device 14 can access the encoded video data over any standard data connection, including an Internet connection. This may include a wireless channel (e.g. a Wi-Fi connection), a wired connection (e.g. DSL, cable modem, etc.), or a combination of both that is suitable for accessing stored encoded video data. on a file server. The transmission of encoded video data from the storage device may be a streaming transmission, a downloading transmission, or a combination thereof.
[0038] [0038] The techniques of this disclosure are not necessarily limited to wireless applications or configurations. The techniques can be applied to video encoding in support of any of a variety of multimedia applications such as terrestrial television broadcasts, cable television broadcasts, satellite television broadcasts, streaming video transmissions over the Internet, such as, dynamic adaptive streaming over HTTP (DASH), digital video that is stored on a data storage medium, decoding of digital video stored on a data storage medium, or other applications. In some examples, system 10 may be configured to support unidirectional or bidirectional video transmission to support applications such as video streaming, video playback, video transmission, and/or video telephony.
[0039] [0039] In the example of Figure 1, the source device 12 includes video source 18, video encoder and output interface 22. The destination device 14 includes an input interface 28, a dynamic range adjustment unit (DRA ) 19, a video decoder 30 and a display device 32. In accordance with this disclosure, the DRA unit 19 of source device 12 can be configured to implement the techniques of this disclosure, including signaling and related operations applied to data from video in certain color spaces to enable more efficient compression of HDR and WCG video data. In some examples, the DRA unit 19 may be separate from the video encoder 20. In other examples, the DRA unit 19 may be part of the video encoder 20. In other examples, a source device and a destination device include other components or provisions. For example, source device 12 may receive video data from an external video source 18, such as an external camera. Similarly, the target device 14 may interface with an external display device, rather than including an integrated display device.
[0040] [0040] The illustrated system 10 of Figure 1 is merely an example. HDR and WCG video data processing and encoding techniques can be performed by any video encoding and/or digital video decoding device. Furthermore, some examples of techniques of this disclosure may also be performed by a video preprocessor and/or video postprocessor. A video preprocessor can be any device configured to process video data before encoding (for example, before HEVC, VVC, or other encoding). A video post processor can be any device configured to process video data after encoding (for example, after HEVC, VVC, or other decoding). Source device 12 and target device 14 are merely examples of such encoding devices where source device 12 generates encoded video data for transmission to target device 14. In some examples, devices 12, 14 may operate in a substantially symmetrical manner so that each of the devices 12, 14 includes video encoding and decoding components, as well as a video preprocessor and a video postprocessor (e.g. DRA unit 19 and DRA unit 31, respectively) Then, the system 10 can support unidirectional or bidirectional video transmission between the video devices 12, 14, for example, for video streaming, video playback, video broadcasting or video telephony.
[0041] [0041] Video source 18 of source device 12 may include a video capture device, such as a video camera, a video file containing previously captured video, and/or a video feed interface for receiving video from a video source. video content provider. As a further alternative, the video source 18 may generate computer graphics based data as the source video, or a combination of live video, archived video and computer generated video. In some cases, if the video source 18 is a video camera, the source device 12 and the target device 14 can form so-called camera phones or video phones. As mentioned above, the techniques described in this disclosure may be applicable to video encoding and video processing in general and may be applied to wireless and/or wired applications. In each case, the captured, pre-captured or computer generated video can be encoded by the video encoder
[0042] [0042] Input interface 28 of target device 14 receives information from computer readable medium 16. Information from computer readable medium 16 may include syntax information defined by video encoder 20 which is also used by video decoder 30, which include syntax elements that describe features and/or processing of blocks and other coded units, eg groups of images (GOPs). The display device 32 displays the encoded video data to a user, and may comprise any of a number of display devices, such as a cathode ray tube (CRT), a liquid crystal display (LCD), a plasma display, an organic light-emitting diode (OLED) screen or other type of display device.
[0043] [0043] Video encoder 20 and video decoder 30 may be implemented as any of a variety of suitable encoder or decoder circuitry, such as one or more microprocessors, digital signal processors (DSPs), integrated circuit for specific application (ASICS), field programmable gate arrays (FPGAs), discrete logic, software, hardware, firmware, or any combination thereof. When the techniques are partially implemented in software, a device may store instructions for the software on a suitable non-temporary computer-readable medium and execute the instructions in hardware using one or more processors to perform the techniques in this disclosure. Each video encoder 20 and video decoder 30 may be included in one or more encoders or decoders, any of which may be integrated as part of a combined encoder/decoder (CODEC) in a respective device.
[0044] [0044] DRA Unit 19 and Inverse DRA Unit 31 can be implemented as any of a variety of suitable encoder circuitry, such as one or more microprocessors, DSPs, ASICs, FPGAs, discrete logic, software, hardware , firmware or any combination thereof. When the techniques are partially implemented in software, a device may store instructions for the software on a suitable non-temporary computer-readable medium and execute the instructions in hardware using one or more processors to perform the techniques in this disclosure.
[0045] [0045] In some examples, the video encoder 20 and the video decoder 30 may operate according to a video compression standard such as ITU-T H.265/HEVC, VVC, or other video coding standards of close proximity. generation.
[0046] [0046] In HEVC and other video encoding standards, a video sequence typically includes a series of images. Images can also be called “frames”. The reconstructed image can include three sample matrices, denoted SL, Scb, and Scr. S', is a two-dimensional array (ie a block) of luma samples. Scv is a two-dimensional array of Cb chrominance samples. Scr Is a two-dimensional array of Cr chrominance samples. Chroma samples may also be referred to in this document as “chroma” samples. In other cases, an image may be monochromatic and may only include an array of luma samples.
[0047] [0047] The video encoder 20 can generate a set of coding tree units (CTUs). Each of the CTUs may comprise a tree encoding block of luma samples, two corresponding tree encoding blocks of chroma samples, and syntax structures used to encode the samples of the tree encoding blocks. In a monochrome image or an image that has three separate color planes, a CTU may comprise a single tree-coding block and syntax structures used to encode the tree-coding block samples. A tree-coding block can be an NxN block of samples. A CTU can also be called a “tree block” or a large coding unit (LCU). HEVC CTUs can be broadly analogous to the macroblocks of other video encoding standards, such as H.264/AVC. However, a CTU is not necessarily limited to a specific size and may include one or more encoding units (CUs). A slice can include an integer number of CTUs ordered consecutively in the traced scan.
[0048] [0048] This disclosure may use the term “video unit” or “video block” to refer to one or more sample blocks and syntax structures used to encode the samples from the one or more sample blocks. Examples of video unit types might include CTUs, CUs, PUs, transform units (TUs) in HEVC, or macroblocks, macroblock partitions, and so on in other video encoding standards.
[0049] [0049] To generate an encoded CTU, the video encoder 20 can recursively perform the quadrant tree partition on the encoding tree blocks of a CTU to divide the encoding tree blocks into encoding blocks, then the name “units” tree coding”. An encoding block is an NXN block of samples. A CU may comprise a luma sample coding block and two corresponding chroma sample coding blocks of an image having a luma sample matrix, a Cb sample matrix, and a Cr sample matrix, and structures syntax used to encode the coding blocks samples. In monochrome images or images that have three separate color planes, a CU may comprise a single encoding block and syntax structures used to encode the encoding block samples.
[0050] [0050] The video encoder 20 can partition an encoding block of a CU into one or more prediction blocks. A prediction block can be a rectangular (ie square or non-square) block of samples to which the same prediction is applied. A prediction unit (PU) of a CU may comprise a prediction block of luma samples, two corresponding prediction blocks of chroma samples of an image, and syntax structures used to predict the prediction block samples. In a monochrome image OR An image that has three separate color planes, a PU can comprise a single prediction block and syntax structures used to predict the prediction block samples. The video encoder 20 can generate luma, Cb and Cr predictive blocks for luma, Cb and Cr prediction blocks of each PU of the CU.
[0051] [0051] In JEM7, instead of using the HEVC quadtree partitioning structure described above, a quadtree binary tree (QTBT) partitioning structure can be used. The QTBT framework removes the concepts of various types of partitions. That is, the QTBT framework removes the separation of CU, PU, and TU concepts, and supports more flexibility for CU partition formats. In the block structure of QOTBT, a CU can have a square or rectangular shape. In one example, a CU is primarily partitioned by a quadtree structure. Quadtree leaf nodes are further partitioned by a binary tree structure.
[0052] [0052] In some examples, horizontal two types of division: symmetrical horizontal division and symmetrical vertical division. The leaf nodes of the binary tree are called CUs, and this segmentation (ie the CU) is used for prediction and transform processing without any further partitioning. This means that the CU, PU and TU have the same block size in the QTBT coding block structure. In JEM, a CU sometimes consists of encoding blocks (CBs) of different color components. For example, a CU contains one luma CB and two chroma CBs in the case of P and B slices of 4:2:0 chroma format, and sometimes consists of a single component CB. For example, a CU contains only one luma CB or only two chroma CBs in the case of I slices.
[0053] [0053] In some examples, video encoder 20 and video decoder 30 can be configured to operate in accordance with JEM/VVC. According to JEM/VVC, a video encoder (such as video encoder 20) partitions an image into a plurality of CUs. An example JEM QTBT structure includes two levels: a first level partitioned according to quadtree partitioning, and a second level partitioned according to binary tree partitioning. A root node of the QOTBT structure corresponds to a CTU. Leaf nodes of binary trees correspond to coding units (CUs).
[0054] [0054] In some examples, the video encoder 20 and the video decoder 30 can use a single QTBT structure to represent each of the luminance and chrominance components, while in other examples, the video encoder 20 and the video decoder video 30 can use two or more QTBT frames, such as one QTBT frame for the luminance component and another QTBT frame for both chrominance components (or two QTBT frames for the respective chrominance components).
[0055] [0055] Video encoder 20 and video decoder 30 can be configured to use quadtree partitioning by HEVC, QTBT partitioning according to JEM/VVC, or other partitioning structures. For purposes of explanation, the description of the techniques of this disclosure is presented in relation to QTBT partitioning. However, it should be understood that the techniques of this disclosure can also be applied to video encoders configured to use quadtree partitioning, or other types of partitioning as well.
[0056] [0056] Figures 2A and 2B are conceptual diagrams illustrating an example quadtree binary tree (QOTBT) structure 130, and a corresponding coding tree unit (CTU) 132. The solid lines represent the quadtree division, and the dotted lines indicate binary tree splitting. At each split (i.e. leafless) node of the binary tree, a flag is flagged to indicate the type of split (i.e. horizontal or vertical) that is used, where O indicates horizontal split and 1 indicates vertical split in this example. For quadtree splitting, there is no need to indicate the split type, as quadtree nodes split a block horizontally and vertically into 4 sub-blocks of equal size. Accordingly, the video encoder 20 can encode, and the video decoder 30 can decode, syntax elements (such as split information) to a QTBT structure region tree level 130 (i.e., the continuous lines) and elements syntax (such as splitting information) for a structure prediction tree level of QOTBT 130 (that is, the dashed lines). Video encoder 20 can encode, and video decoder 30 can decode video data, such as prediction and transform data, for CUs represented by QTBT frame terminal leaf nodes 130.
[0057] [0057] In general, the CTU 132 of Figure 2B can be associated with parameters that define block sizes corresponding to QTBT structure nodes 130 in the first and second levels. These parameters can include a CTU size (representing a size of CTU 132 in the samples), a minimum quadtree size (MinQTSize, representing a minimum allowable quadtree leaf node size), a maximum binary tree size (MaxBTSize, representing a maximum allowable binary tree root node size), a maximum binary tree depth (MaxBTDepth, representing a maximum allowable binary tree depth), and a minimum binary tree size (MinBTSize, representing the minimum allowable tree leaf node size binary).
[0058] [0058] The root node of a QTBT structure corresponding to a CTU can have four child nodes at the first level of the QTBT structure, each of which can be partitioned according to quadtree partitioning. That is, the first-level nodes are either leaf nodes (which have no child nodes) or have four child nodes. The QTBT structure example 130 represents such nodes as including the parent node and child nodes that have solid lines for branches. If a first-level node is not larger than the maximum allowable binary tree root node size (MaxBTSize), then the node can be further partitioned by the respective binary trees. The binary tree split of a node can be iterated until the nodes resulting from the split reach the minimum allowable binary tree leaf node size (MinBTSize) or the maximum allowable binary tree depth (MaxBTDepth). The example QTBT structure 130 represents such nodes as having dashed lines for branches. The leaf node of the binary tree is called an encoding unit (CU), which is used for prediction (eg intra-picture or inter-picture prediction) and transformed, without any further partitioning. As discussed above, CUs can also be called "video blocks" or "blocks".
[0059] [0059] In an example of the QTBT partitioning structure, the CTU size is set to 128x128 (luma samples and two corresponding chroma samples of 64x64), the MinQTSize is set to l16x16, the MaxBTSize is set to 64x64, the MinBTSize (for width and height) is set to 4, and MaxBTDepth is set to 4. Quadtree partitioning is applied to the CTU first to generate quadtree leaf nodes. Quadtree leaf nodes can have a size from l6x16 (ie the MinQTSize) to 128x128 (ie the size of CTU). If the quadtree leaf node is 128x128, then the node will be further split by the binary tree as the size exceeds the MaxBTSize (ie 64x64 in this example). Otherwise, the quadtree leaf node will be further partitioned by the binary tree. Therefore, the leaf node of the quadtree is also the root node for the binary tree and has the binary tree depth as 0. When a binary tree depth reaches MaxBTDepth (4 in this example), no further splitting is allowed. The binary tree node that has width equal to MinBTSize (4, in this example) implies that no further horizontal splitting is allowed. Similarly, a binary tree node that has a height equal to MinBTSize implies that no further vertical division is allowed for that binary tree node. As noted above, the leaf nodes of the binary tree are called CcUs, and are further processed according to the prediction and transform without further partitioning.
[0060] [0060] The video encoder 20 can use intraprediction or interprediction to generate the predictive blocks for a PU. If the video encoder 20 uses intraprediction to generate the predictive blocks of a PU, the video encoder 20 can generate the predictive blocks of the PU based on decoded samples of the image associated with the PU.
[0061] [0061] If the video encoder 20 uses interprediction to generate the predictive blocks of a PU, the video encoder 20 can generate the predictive blocks of the PU based on decoded samples of one or more images except the image associated with the PU. The interprediction can be a unidirectional interprediction (ie, uniprediction) or a bidirectional interprediction (ie, biprediction). To perform uniprediction or biprediction, the video encoder 20 can generate a first reference image list (RefPicListO) and a second reference image list (RefPicListl) for a current slice.
[0062] [0062] Each of the reference image lists can include one or more reference images.
[0063] [0063] When using biprediction to encode a PU, the video encoder 20 can determine a first reference location in a reference image in RefPicList and a second reference location in a reference image in RefPicListl . The video encoder 20 can then generate, based at least in part on the samples corresponding to the first and second reference locations, the predictive blocks for the PU. In addition, when using biprediction to encode the PU, the video encoder 20 may generate a first motion indicating a spatial offset between a sample block of the PU and the first reference location and a second motion indicating a spatial offset between the PU sample block and the first reference location. of PU prediction and the second reference location.
[0064] [0064] In some examples, JEM/VVC also provides an affine motion compensation mode, which can be considered an interprediction mode. In affine motion compensation mode, the video encoder 20 can determine two or more motion vectors that represent non-translational motion, such as zooming in or out, rotation, perspective motion, or other types of irregular motion.
[0065] [0065] After the video encoder 20 generates predictive luma blocks Cb and Cr for one or more PUs of a CU, the video encoder 20 can generate a residual luma block for the CU. Each sample in the residual CU luma block indicates a difference between a luma sample in one of the CU's predictive luma blocks and a corresponding sample in the original CU luma encoding block. Furthermore, the video encoder 20 can generate a residual block of Cb for the CU. Each sample in the residual Cb block of the CU can indicate a difference between a sample of Cb in one of the CU's predictive Cb blocks and a corresponding sample in the original Cb coding block of the CU. The video encoder 20 can also generate a residual block of Cr for the CU. Each sample in the residual Cr block of CU can indicate a difference between a sample of Cr in one of the predictive Cr blocks of CU and a corresponding sample in the original Cr-encoding block of CU.
[0066] [0066] In addition, the video encoder 20 can use the quadrant tree partition to decompose the residual luma, Cb, and Cr blocks of a CU into one or more luma, Cb, and Cr transform blocks. A transform block can be a rectangular block of samples to which the same transform is applied. A transform unit (TU) of a CU may comprise a transform block of luma samples, two corresponding transform blocks of chroma samples, and syntax structures used to transform the transform block samples. In a monochrome image, or an image that has three separate color planes, a TU can comprise a single transform block and syntax structures used to transform the transform block samples. Thus, each TU of a CU can be associated with a luma transform block, a Cb transform block and a Cr transform block. The luma transform block associated with the TU may be a sub-block of the residual luma block of the CU. The transform block of Cb may be a sub-block of the residual Cb block of the CU. The Cr transform block can be a subblock of the residual Cr block of the CU.
[0067] [0067] The video encoder 20 may apply one or more transforms to a luma transform block of a TU to generate a luma coefficient block for the TU. A coefficient block can be a two-dimensional array of transform coefficients. A transform coefficient can be a scalar quantity. Video encoder 20 may apply one or more transforms to a Cb transform block of a TU to generate a Cb coefficient block for the TU. Video encoder 20 may apply one or more transforms to a Cr transform block of a TU to generate a Cr coefficient block for the TU.
[0068] [0068] After generating a coefficient block (eg, a luma coefficient block, a Cb coefficient block, or a Cr coefficient block), the video encoder 20 can quantize the coefficient block. In general, quantization refers to a process where transform coefficients are quantized possibly to reduce the amount of data used to represent transform coefficients, providing additional compression. In addition, the video encoder 20 can inversely quantize the transform coefficients and apply an inverse transform to the transform coefficients to reconstruct the TU transform blocks of CUs of a picture. The video encoder 20 can use the reconstructed transform blocks of TUs of a CU and the predictive blocks of PUs of the CU to reconstruct the encoding blocks of the CU. By reconstructing the coding blocks of each CU of an image, the video encoder 20 can reconstruct the image. The video encoder can store reconstructed images in a decoded image buffer (DPB). The video encoder 20 can use images reconstructed in DPB for interprediction and intraprediction.
[0069] [0069] After the video encoder 20 quantizes a coefficient block, the video encoder 20 can perform entropy coding of syntax elements indicating the quantized transform coefficients. For example, the video encoder 20 can perform Context Adaptive Binary Arithmetic Coding (CAB AC) on the syntax elements indicating the quantized transform coefficients. The video encoder 20 can send the entropy encoded syntax elements in a bit stream.
[0070] [0070] The video encoder 20 may send a bit stream that includes a sequence of bits that forms a representation of encoded images and associated data. The bit stream may comprise a sequence of network abstraction layer (NAL) units. Each NAL unit includes an NAL unit header and encapsulates a raw byte sequence (RBSP) payload. The NAL unit header can include a syntax element that indicates an NAL unit type code. The NAL unit type code specified by the NAL unit header of an NAL unit indicates the type of the NAL unit. An RBSP can be a syntax structure containing an integer number of bytes that is encapsulated within an NAL unit. In some cases, an RBSP includes zero bits.
[0071] [0071] Different types of NAL units can encapsulate different types of RBSPs. For example, a first type of NAL unit may encapsulate an RBSP for a picture parameter set (PPS), a second type of NAL unit may encapsulate an RBSP for an encoded slice, a third type of NAL unit may encapsulate an RBSP for Supplemental Enhancement Information (SEI), and so on. A PPS is a syntax structure that can contain syntax elements that apply to zero or more entire encoded images.
[0072] [0072] Video decoder 30 can receive a bit stream. In addition, the video decoder 30 can analyze the bitstream to decode syntax elements of the bitstream. The video decoder 30 can reconstruct the images of the video data based at least in part on the decoded syntax elements from the bit stream. The process for reconstructing the video data may be generally reciprocal to the process performed by the video encoder 20. For example, the video decoder 30 may use PU motion vectors to determine predictive blocks for the PUs of a current CU. The video decoder 30 can use a motion vector or motion vectors from PUs to generate predictive blocks for the PUs.
[0073] [0073] In addition, the video decoder 30 can inversely quantize the coefficient blocks associated with TUs of the current CU. The video decoder 30 can perform inverse transforms on the coefficient blocks to reconstruct the transform blocks associated with the TUs of the current CU. The video decoder 30 can reconstruct the coding blocks of the current CU by adding samples from the predictive sample blocks for PUs of the current CU to corresponding samples of the transform blocks of the TUs of the current CU. By reconstructing the coding blocks for each CU of an image, the video decoder 30 can reconstruct the image. Video decoder 30 may store decoded images in a decoded image buffer for output and/or for use in decoding other images.
[0074] [0074] Next-generation video applications are anticipated to operate with video data representing the scene captured with HDR and/or (WCG. The used dynamic range and color gamut parameters are two independent attributes of video content , and its specification for purposes of digital television and multimedia services are defined by several international standards. For example, ITU-R Rec. BT.709, "Parameter values for the HDTV standards for production and international program exchange", defines parameters for HDTV (high definition television) such as standard dynamic range (SDR) and standard color gamut, and ITU-R Rec. BT.2020, "Parameter values for ultra-high definition television systems for production and international program exchange", specifies parameters UHDTV (ultra-high definition television) such as HDR and WCG. There are also other standards development organization (SDOSS) documents that specify dynamic range and video range attributes. colors in other systems, for example, the DCI-P3 color gamut is defined in SMPTE-231-2 (Society of Motion Picture and Television Engineers) and some HDR parameters are defined in SMPTE-2084. A brief description of dynamic range and color gamut for video data is provided below.
[0075] [0075] Dynamic range is typically defined as the ratio between the minimum and maximum brightness (eg luminance) of the video signal. Dynamic range can also be measured in terms of 'f-number scale', where an f-number scale corresponds to a doubling of a signal's dynamic range. In the MPEG definition, HDR content is content that varies in brightness with more than 16 f-number scales. In some terms, levels between 10 and 16 f-number scales are considered intermediate dynamic range, but in other definitions they are considered HDR. In some examples of this disclosure, HDR video content can be any video content that has a greater dynamic range than video content traditionally used with a standard dynamic range (e.g., video content as specified by ITU-R Rec. BT.709).
[0076] [0076] The human visual system (HVS) is capable of perceiving much greater dynamic ranges than SDR content and HDR content. However, the HVS includes an adaptation mechanism to narrow the dynamic range of the HVS to a so-called simultaneous range. The width of the simultaneous range may be dependent on current lighting conditions (eg current brightness). The dynamic range visualization provided by HDTV SDR, UHDTV Dynamic Range Expected HDR, and HVS is shown in Figure 3, although the exact range may vary based on each individual and display.
[0077] [0077] Some examples of video applications and services are regulated by ITU Rec.709 and provide
[0078] [0078] Another aspect of a more realistic video experience, besides HDR, is the color dimension. The color dimension is typically defined by the color gamut. Figure 4 is a conceptual diagram showing a color gamut for SDR (triangle 100 based on BT.709 primary colors), and a larger color gamut than for UHDTV (triangle 102 based on BT.2020 primary colors). Figure 3 also shows the so-called spectrum location (delimited by the tongue-shaped area 104), representing the limits of natural colors. As illustrated by Figure 3, the switch from BT.709 (triangle 100) primary colors to BT.2020 (triangle 102) aims to provide UHDTV services with about 70% more colors. D65 specifies a white color example for the BT.709 and/or BT.2020 specifications.
[0079] [0079] Examples of color gamut specifications for the DCI-P3, BT.709, and BT.202 color spaces are shown in Table 1.
[0080] [0080] As can be seen in Table 1, a range of colors can be defined by the X and Y values of a white point, and by the X and Y values of the primary colors (for example, red (R), green (G) and blue (B). The X and Y values represent the chromaticity (X) and brightness (Y) of colors as defined by the CIE color space
[0081] [0081] HDRAVCG video data is typically captured and stored at very high precision per component (regular floating point), with 4:4:4 chroma subsampling format and very wide color space (e.g. CIE XYZ ). This representation aims at high precision and is almost mathematically lossless. However, such a format for storing HDRAVCG video data may include a lot of redundancy and may not be ideal for compression purposes. A lower precision format with assumptions based on HVS is typically used for high-end video applications.
[0082] [0082] An example of a video data format conversion process for compression purposes includes three main processes, as shown in Figure 5. The techniques of Figure 5 can be performed by source device 12. Linear RGB data 110 can be HDR/WCG video data and can be stored in a floating point representation.
[0083] [0083] The inverse conversion on the decoder side is represented in Figure 6. The techniques in Figure 6 can be performed by the target device
[0084] [0084] The techniques represented in Figure 5 will be discussed in more detail. Mapping the digital values that appear in an image container to and from optical energy may involve the use of a “transfer function”. In general, a transfer function is applied to data (eg HDR/WCG video data) to compress the dynamic range of the data. Such compression allows data to be represented with fewer bits. In one example, the transfer function may be a non-linear one-dimensional (ID) function and may reflect the inverse of an electro-optical transfer function (EOTF) of the end-user display, for example, as specified for SDR in ITU- R BT. 1886 (also defined in Rec. 709) In another example, the transfer function can approximate the perception of HVS to changes in brightness, eg the PQ transfer function specified in SMPTE-2084 for HDR. The reverse process of OETF is EOTF (Electro-Optical Transfer Function), which maps code levels back to luminance. Figure 7 shows several examples of nonlinear transfer functions used to compress the dynamic range of certain color containers. Transfer functions can also be applied to each R, G and B component separately.
[0085] [0085] The reference EOTF specified in the ITU-R recommendation BT.1886 is defined by the equation: 1 = a(max[(V +5).01)* where: L: Screen luminance in cd/im Lw: Screen Luminance to White LB: Screen Luminance to Black V: Input video signal level (normalized, black at V = 0, to white at V = 1. For content mastered by ITU-R Recommendation BT.709, map “D” of 10-bit digital code values into V values by the following equation: V = (Z) -64)/876 y : Power function exponent, y = 2.404 a: Variable for user gain (control of inherited "contrast")
[0086] [0086] The above variables a and b are derived by solving the following equations so that V = 1 gives L = Lw, and that V= O gives L = Lg: Lg = ab Ly=a-(1+08Y
[0087] [0087] To support higher dynamic range data more efficiently, SMPTE has recently standardized a new transfer function called SMPTE ST-2084. The ST2084 specification defined the application of EOTF as follows. A TF is applied to normalized linear R, G, B values, which results in a non-linear representation of R'G'B'. ST-2084 sets normalization to NORM=10000, which is associated with a peak brightness of 10000 nits (cd/m2).
[0088] [0088] Typically, an EOTF is defined as a function with a floating point precision, so no error is introduced into a signal with this nonlinearity if an inverse TF (called OETF) is applied. An inverse TF (OETF) specified in ST-2084 is defined as an inversePQ function: o R = 10000*inversePQ TF(R') o G = 10000*inversePQ TF(G') o B = 10000*inversePQ TF(B') 1, (2) tm, ca NT with inversePQ TF(N) = ee ia 2610 1 m, = 1096 2 0.1593017578125 =. x 128 = 7884375 "2 = 4096 7 3424 c=6-e+l = 1096 0.8359375 = x 32 = 188515625 Palm quçç 1 S2 sr) A oo x 32 18.6875 BE gg ÉS
[0089] [0089] Note, that EOTF and OETF is a subject of very active research and standardization, and the TF used in some video encoding systems may be different from ST-2084.
[0090] [0090] In the context of this disclosure, the terms "signal value" or "color value" may be used to describe a luminance level corresponding to the value of a specific color component (such as R, G, B or Y) for an image element. The signal value is typically representative of a linear light level (luminance value). The terms "code level" or "digital code value" may refer to a digital representation of an image signal value. Typically, such a digital representation is representative of a non-linear signal value. An EOTF represents the relationship between non-linear signal values supplied to a display device (eg display device 32) and the linear color values produced by the display device.
[0091] [0091] RGB data is typically used as the input color space, as RGB is the type of data that is typically produced by image capture sensors. However, the RGB color space has high redundancy among its components and is not ideal for compact representation. To obtain a more compact and more robust representation, the RGB components are typically converted (eg a color transformation is performed) to an uncorrected color space that is better suited for compression, for example YCbCr. A YCbCr color space separates brightness in the form of luminance (Y) and color information (CrCb) into different, less correlated components. In this context, a robust representation can refer to a color space with higher levels of error resilience when compressed at a restricted bitrate.
[0092] [0092] For modern video encoding systems, a typically used color space is YCbCr, as specified in ITU-R BT.709. The YCbCr color space in the BT.709 standard specifies the following conversion process from R'G'B' to Y'CbCr (non-constant luminance representation): a. Y' = 0.2126 * R' + 0/7152 * G' + 0.0722 * B' B'—Y' b. Cb = 18556 R'=Y' Cc. Cr = 15748
[0093] [0093] The above description can also be implemented using the following approximate conversion that avoids splitting the Cb and Cr components: a. Y' = 0.212600*R' + 0.715200 *G' + 0.072200 *B' b. Cb = -0.114572*R' - 0.385428 *G' + 0.5000000 *B' Cc. Cr = 0.500000*R' - 0.454153 *G' - 0.045847 *B'
[0094] [0094] The ITU-R BT.2020 standard specifies the following conversion process from R'G'B' to Y'CbCr (non-constant luminance representation): a. Y' = 0.2627 * R' + 0.6780 * G' + 0.0593 * B' BY" b. Cb = 188 R=Y' Cc. Cr = 14746
[0095] [0095] The above description can also be implemented using the following approximate conversion that avoids splitting the Cb and Cr components: a. Y' = 0.262700*R' + 0.678000*G' + 0.059300*B' b. Cb = -0.139630*R' - 0.360370*G' + 0.500000*B' Cc. Cr = 0.500000*R' - 0.459786*G' - 0.040214*B'
[0096] [0096] After color transformation, input data in a target color space can still be represented at high bit depth (eg floating point precision). High bit depth data can be converted to a target bit depth, for example, using a quantization process. Certain studies show that 10 to 12 bit accuracy in combination with PQ transfer is sufficient to deliver HDR data of 16 f-number scales with distortion below the Minimum Perceptible Difference (JND). In general, an IND is the amount (eg video data) that must be changed for a difference to be noticeable (eg by the HVS). Data represented with 10-bit precision can be further encoded with most state-of-the-art video encoding solutions. This quantization is a lossy encoding element and is a source of imprecision introduced into the converted data.
[0097] [0097] An example of quantization applied to codewords in the target color space (in this example, YCbCr) is shown below. Input YCbCr values represented in floating point precision are converted to a fixed bit depth signal BitDepthY for the Y value and BitDepthc for the chroma values (Cb, Cr).
[0098] [0098] A rate distortion optimized quantizer (RDOQ) will now be described. Most state-of-the-art video coding solutions (e.g. HEVC and VVC in development) are based on the so-called hybrid video coding scheme, which basically applies scalar quantization of the resulting transform coefficients of residual signal produced in turn. , applying temporal or spatial prediction between currently encoded video signal and available reference image(s) on the decoder side. Scalar quantization is applied on the encoder side (e.g. video encoder 20) and inverse scalar dequantization is applied on the decoder side (e.g. video decoder 30). Lossy scalar quantization introduces distortion into the reconstructed signal and requires a certain number of bits to provide quantized transform coefficients as well as description of encoding modes at the decoder side.
[0099] [0099] During the evolution of video compression techniques, several approaches aimed at improving the quantized coefficient calculation were developed. One approach is Rate Distortion Optimized Quantization (RDOQ), which is based on roughly estimating the RD cost of modifying or removing a selected transform coefficient or group of transform coefficients. The purpose of RDOQ is to find the ideal or most ideal set of quantized transform coefficients that represent residual data in a coded block. RDOQ calculates the image distortion (introduced by quantizing transform coefficients) in an encoded block and the various bits needed to encode the corresponding quantized transform coefficient. Based on these two values, the encoder selects the best coefficient value by calculating the RD cost.
[0100] [0100] The RDOQ in the encoder can include 3 stages: the quantization of transform coefficients, the elimination of groups of coefficients (CG), and the selection of the last non-zero coefficient. In the first stage, the video encoder produces uniform quantizer transform coefficients with no dead zone, which results in the calculation of the level value for the current transform coefficient. After that, the video encoder considers two additional magnitudes of this quantized coefficient: Level -1 and 0. For each of these 3 options (Level, Level-1, 0), the video encoder calculates the RD cost of encoding the coefficient with the selected magnitude and choose the one with the lowest DR cost. Also, some RDOQ implementations may consider nullifying a transform coefficient group completely or reducing the size of the signed transform coefficient group by reducing the position of the last signed coefficient for each of the groups. On the decoder side, inverse scalar quantization is applied to quantized transform coefficients derived from the bitstream syntax elements.
[0101] [0101] Some existing transfer functions and color transformations used in video encoding may result in a representation of video data that exhibits significant variation from Minimum Perceptible Difference (JIND) threshold values in the dynamic range of the signal representation. That is, some ranges of codeword values for luma and/or chroma components may have different JIND threshold values than other ranges of codeword values for luma and/or chroma components. For such representations, a quantization scheme that is uniform over the dynamic range of luma values (e.g., uniform over all luma codeword values) could introduce quantization error with merit different from human perception over signal fragments. (dynamic range partitions). Such an impact on signals can be interpreted by a viewer as a processing system with non-uniform quantization, which results in uneven signal-to-noise ratios within the processed data range.
[0102] [0102] Examples of such a representation is a video signal represented in the Non-Constant Luminance (NCL) YCbCr color space whose primary colors are defined in ITU-R Rec.BT.2020 and with the transfer transfer function =ST- 2084 As illustrated in Table 2, the NCL YCLbCR color spaces allocate significantly more codewords for low signal strength values, for example, 30% of codewords represent linear light samples <10 nits, while the high intensity (high brightness) samples are represented with a much smaller amount of codewords, e.g. 25% of codewords are allocated for linear light in the range of 1000 to 10000 nits As a result, a video encoding system , for example, H.265/HEVC, with uniform quantization for all bands of the data could introduce much more severe coding artifacts in the high intensity samples (light region of the signal), where the distortion introduced in the low intensity samples ( dark region of the same signal) could be far below the perceptible difference.
[0103] [0103] Effectively, this means that video encoding system design, or encoding algorithms can benefit from tuning for each selected video data representation, ie for each selected transfer function and color space. Previously, the following methods were proposed to solve the problems with non-ideal perceptual quality codeword distribution described above.
[0104] [0104] In "Dynamic Range Adjustment SEI to enable High Dynamic Range video coding with Backward-Compatible Capability," D. Rusanovskyy, A. K. Ramasubramonian, D. Bugdayci, Ss. Lee, J. Sole, M. Karczewicz, VCEG document COM16-C 1027-E, Sep.2015, the authors propose to apply codeword redistribution to video data prior to video encoding. Video data in the ST-2084/BT.2020 representation undergoes a codeword redistribution prior to video compression. Redistribution introduces a linearization of perceived distortion (signal to noise ratio) within a dynamic range of the data through Dynamic Range Adjustment. Redistribution has been found to improve visual quality under bitrate constraints. To compensate for redistribution and convert data into the original ST 2084/BT.2020 representation, an inverse process is applied to the data after video decoding.
[0105] [0105] One of the disadvantages of this approach is the fact that pre-processing and post-processing are generally decoupled from the rate distortion optimization processing employed by high-end encoders on a block-based basis. Therefore, the technique described in document VCEG COM1I6-C 1027-E does not employ information available to the decoder, such as target frame quantization skew rate introduced by the video codec quantization scheme.
[0106] [0106] In "Performance investigation of high dynamic range and wide color gamut video coding techniques," J. Zhao, S.-H. Kim, A. Segall, K. Misra, VCEG document COM16-C 1030-E, September 2015, a spatially variable intensity-dependent (block-based) quantization scheme has been proposed to align bitrate allocation and distortion visually perceived between the video encoding applied in representations of Y2020 (ST2084/BT2020) and Y709 (BT1886/BT 2020). It was observed that, in order to maintain the same level of quantization in the luma components, the quantization of the signal in Y2020 and Y709 differs by a value that depends on the luma, so that: QP Y2xw%= QP Y709— f(Y2020)
[0107] [0107] The function / (Y2020 ) were considered linear for video intensity (brightness level) values in Y2020, and could be approximated as:
[0108] [0108] It was found that the proposed spatially variable quantization scheme that is introduced in the encoding stage is able to improve the visually perceived quantization signal-to-noise ratio of the encoded video signal in the ST 2084/BT.2020 representation.
[0109] [0109] A disadvantage of this approach is a block-based granularity of QP adaptation. The typically used block sizes selected on the encoder side for compression are derived through a rate distortion optimization process, and may not represent dynamic range properties of the video signal, so the QP settings selected will be suboptimal for the signal inside the block. This issue may become even more important for next-generation video encoding systems that tend to employ larger-dimensional prediction and transform block sizes. Another aspect of this design is the need to signal QP adaptation parameters on the decoder side for inverse dequantization. Additionally, spatial adaptation of quantization parameters on the encoder side increases the complexity of coding optimization and can interfere with rate control algorithms.
[0110] [0110] In "Intensity dependent spatial quantization with application in HEVC," Matteo Naccari and Marta Mrak, In Proc. of IEEE ICME 2013, July 2013, an Intensity Dependent Spatial Quantization (QIDS) perceptual mechanism was proposed. IDSQ exploits the intensity masking of the human visual system and perceptually adjusts the quantization of the signal at the block level. the authors of this document “propose to employ looped pixel domain scaling. The loop scheduling parameters of a currently processed block are derived from the average luma component values in the predicted block. On the decoder side, inverse scaling is performed, and the decoder derives scaling parameters from the predicted block available on the decoder side.
[0111] [0111] Similar to the techniques in "Performance investigation of high dynamic range and wide color gamut video coding techniques", a block-based granularity of this approach constrains the performance of this method due to a suboptimal scaling parameter, which is applied to all samples from the processed block Another aspect of the proposed solution is that the scale value is derived from the predicted block and does not reflect the signal fluctuation that may occur between the current and the predicted codec block.
[0112] [0112] "De-quantization and scaling for next generation containers", J. Zhao, A. Segall, S.-H. Kim, K. Misra (Sharp), JVET document BO0054, January 2016, addresses a problem of non-uniform perceived distortion in ST.2084/BT.2020 representations. The authors proposed to employ loop intensity-dependent block-based transform domain scaling. Loop scaling parameters for selected transform coefficients (AC coefficients) of the currently processed block are derived as a function of average values of luma components in the predicted block and a DC value derived for the current block. On the decoder side, inverse scaling is performed, and the decoder derives AC coefficient scaling parameters from the predicted block available on the decoder side and from a quantized DC value which is signaled to the decoder.
[0113] [0113] Similar to the techniques in “Performance investigation of high dynamic range and wide color gamut video coding techniques”, and “Intensity dependent spatial quantization with application in HEVC,” a block-based granularity of this approach constrains the performance of this method due to the scaling parameter subideality that is applied to all samples of the processed block. Another aspect of the proposed solution is that the scale value is applied only to AC transform coefficients. Therefore, signal-to-noise ratio enhancement does not affect the DC value, which reduces the performance of the scheme. Also, in some video encoding system designs, the quantized DC value may not be available at the time of AC value scaling, for example, in the case where a quantization process is following a cascade of transform operations. . Another constraint of this proposal is that when the encoder selects the transform skip or transform/quantization shift modes for the current block, scaling is not applied (so in the decoder, scaling is not defined for the scaling modes). transformed and transformed/quantized), which is suboptimal due to the exclusion of potential coding gain for these two modes.
[0114] [0114] In US Patent Application No. 15/595,793, filed May 15, 2017, loop sample processing for video signals with non-uniformly distributed JD was described. This Patent Application describes scaling and shifting signal samples represented in pixel, residual, or transformed domains. Several algorithms for deriving scale and displacement have been proposed.
[0115] [0115] This disclosure describes various video encoder and processing techniques that can be applied in the video encoder loop (e.g. during the video encoder and/or decoding process and not in pre- or post-processing) of a video encoder system. The techniques of this disclosure include encoder-side algorithms (e.g., video encoder 20) with content-adaptive spatially variable quantization without explicit signaling of quantization parameters (e.g., a change in a quantization parameter represented by a syntax element deltagP) to more efficiently compress HDR/WCG video signals. The techniques of this disclosure also include decoder side operations (eg, video decoder 30) that improve the performance of video decoding tools that use quantization parameter information. Examples of such decoding tools might include deblocking filters, two-sided filters, adaptive loop filters, or other video encoding tools that use quantization information as an input.
[0116] [0116] Video encoder 20 and/or video decoder 30 may be configured to perform one or more of the following techniques independently, or in any combination with others.
[0117] [0117] In an example of the disclosure, the video encoder 20 can be configured to perform a multi-stage quantization process for each block of video data in an image of video data. The techniques described below can be applied to both luma and chroma components of video data. The video decoder can be configured to perform quantization using a base quantization parameter value (QPb). That is, the value of QPb is applied uniformly to all blocks. For a given base quantization parameter value (QPb) provided to the transform quantization that will be applied to samples s(Cb) of the coded block Cb, the video encoder 20 can be further configured to use a content dependent QP shift as a deviation from the value of QPb. That is, for each block of video data, or for a group of blocks of video data, the video encoder 20 can additionally determine a QP offset that is based on the contents of the block of groups of blocks.
[0118] [0118] In this way, the video encoder can explain a rate distortion optimized (RDO) selection of a LevelX quantization level, which is produced by an effectively different quantization parameter (QPe). In this revelation, QPe can be called an effective quantization parameter. QPe is the QP offset (deltaQP) plus the base QPb value. The video encoder 20 can derive QPe for a current block Cb using the following equation: QPe(Cb) = QPb(Cb) + deltaQP(s(Cb)), with deltaQP > O) (1)
[0119] [0119] In the video decoder 30, the quantized transform coefficients tq(Cb) are subjected to inverse quantization with the base quantization parameter QPb. The video decoder 30 can derive the base quantization parameter QPb from a syntax element that is associated with the current block Cb. The video decoder 30 can receive the syntax element in an encoded video bitstream. Video decoder 30 may then perform one or more inverse transforms on the inverse quantized transform coefficients to create decoded residual. The video decoder 30 can then perform a prediction process (e.g., interprediction or intraprediction to produce decoded samples d(Cb) for the current block Cb.
[0120] [0120] Note that the video decoder 30 does not use the effective quantization parameter QPe when the residual values of the block are reconstructed. As a result, the distortion introduced by the video encoder 20 when QPe is applied during encoding remains in the residuals, thus ameliorating irregular JND threshold problems with certain color spaces, as discussed above. However, considering that the residual signal has the distortion introduced by the quantization parameter QPe, which is greater than the value of QPb that is communicated in the bitstream and associated with the current Cb, other decoding tools (e.g. loop filtering, entropy decoding, etc.) that rely on the QP parameters provided by the bitstream to smooth out their operation can be tweaked to improve their performance. This adjustment is made by providing the encoding tools into account with an estimate of the actual QPe that was applied by the video encoder 20 to the Cb. As will be explained in more detail below, the video decoder 30 can be configured to derive an estimate of the effective quantization parameter QPe from the decoded samples statistic d(Cb) and other bitstream parameters. In this way, bit overhead is prevented, as block-by-block values of QPe are not signaled in the bit stream.
[0121] [0121] The following sections provide non-limiting examples of implementations of the techniques of this disclosure. Initially, examples of a video encoder structure 20 of an algorithm on the encoder side will be described.
[0122] [0122] Figure 8 is a block diagram illustrating an example video encoder 20 that can implement the techniques of this disclosure. As shown in Figure 8, the video encoder 20 receives a current video block of video data within a video frame to be encoded. In accordance with the techniques of this disclosure, the video data received by the video encoder 20 may be HDR and/or WCG video data. In the example of Figure 8, the video encoder 20 includes a mode selection unit 40, video data memory 41, DPB 64, adder 50, transform processing unit 52, quantization unit 54 and entropy encoding unit. 56. The mode selection unit 40, in turn, includes the motion compensation unit 44, motion estimation unit 42,
[0123] [0123] The video data memory 41 can store video data that will be encoded by the video encoder components 20. The video data stored in the video data memory 41 can be obtained, for example, from the video source. video 18. Decoded image buffer 64 may be a reference image memory that stores reference video data for use in encoding video data by video encoder 20, for example, in intra- or inter-encoding modes. Video data memory 41 and decoded image buffer 64 may be formed by any of a variety of memory devices such as dynamic random access memory (DRAM), including Synchronous DRAM (SDRAM), magnetoresistive RAM (MRAM) , resistive RAM (RRAM), or other types of memory devices. Video data memory 41 and decoded image buffer 64 may be provided by the same memory device or separate memory devices. In various examples, the video data memory 41 may be on-chip with other video encoder 20 components, or off-chip with respect to those components.
[0124] [0124] During the encoding process, Video encoder 20 receives a frame or slice of video that will be encoded. The frame or slice can be split into multiple video blocks. Motion estimation unit 42 and motion compensation unit 44 perform interpredictive coding of the received video block with respect to one or more blocks in one or more reference frames to provide temporal prediction. The intraprediction processing unit 46 may alternatively perform intrapredictive encoding of the received video block with respect to one or more neighboring blocks in the same frame or slice as the block to be encoded to provide a spatial prediction. The video encoder can perform multiple encoding passes, for example to select a suitable encoding mode for each block of video data.
[0125] [0125] In addition, the partition unit 148 can partition the video data blocks into sub-blocks based on the evaluation of previous partitioning schemes in previous encoding passes. For example, partition unit 48 can initially partition a frame or slice into LCUs, and partition each of the LCUs into sub-CUs based on rate skew analysis (eg rate skew optimization). The mode selection unit 40 may additionally produce a quadtree data structure indicative of partitioning an LCU into sub-CUs. Quadtree leaf node CUs can include one or more PUs and one or more TUs. In other examples, the partition unit 48 may partition the input video data according to a QTBT partitioning structure.
[0126] [0126] The mode selection unit 40 can select one of the encoding modes, intra or inter, for example, based on error results, and provide the resulting intra or intercoded block to the adder 50 to generate residual block data and to the adder 62 to reconstruct the coded block for use as a frame of reference. The mode selection unit 40 also supplies syntax elements, such as motion vectors, intramode pointers, partition information, and other syntax information, to the entropy encoding unit 56.
[0127] [0127] Motion estimation unit 42 and motion compensation unit 44 may be highly integrated, but are separately illustrated for conceptual purposes. The motion estimation performed by the motion estimation unit 42 is the process of generating motion vectors, which estimate motion for video blocks. A motion vector, for example, can indicate the displacement of a PU of a video block within a current video frame or image relative to a predictive block within a reference image (or other encoded unit) relative to the current block that is encoded within the current image (or other encoded unit). A predictive block is a block that is found to strictly correspond to the block that will be encoded in terms of pixel difference, which can be determined by sum of absolute difference (SAD), sum of difference of squares (SSD), or other metrics. difference. In some examples, the video encoder 20 may calculate values for sub-integer pixel positions of reference images stored in the decoded image buffer 64. For example, the video encoder 20 may interpolate values of quarter-inch positions. pixel, one-eighth pixel positions, or other fractional pixel positions of the reference image. Therefore, the motion estimation unit 42 can perform a motion search relative to total pixel positions and fractional pixel positions and produce a motion vector with fractional pixel accuracy.
[0128] [0128] The motion estimation unit 42 calculates a motion vector for a PU of a video block in an intercoded slice by comparing the position of the PU with the position of a predictive block of a reference image. The reference image can be selected from a first list of reference images (List 0) or a second list of reference images (List 1), each identifying one or more reference images stored in the decoded image buffer.
[0129] [0129] The motion compensation performed by the motion compensation unit 44, may involve fetching or generating the predictive block based on the motion vector determined by the motion estimate 42. Again, the motion estimation unit 42 and the motion unit motion compensation 44 can be functionally integrated in some examples. Upon receipt of the motion vector for the UP of the current video block, the motion compensation unit 44 can locate the predictive block to which the motion vector points in one of the reference image lists. The adder 50 forms a residual video block by subtracting the pixel values of the predictive block from the pixel values of the current video block being encoded, forming pixel difference values, as discussed below. In general, motion estimation unit 42 performs motion estimation with respect to luma components, and motion compensation unit 44 uses motion vectors calculated based on luma components for both chroma components and luma components. The mode selection unit 40 may also generate syntax elements associated with the video blocks and the video slice for use by the video decoder 30 in decoding the video blocks of the video slice.
[0130] [0130] The intraprediction processing unit 46 can intrapredict a current block, as an alternative to the interprediction performed by the motion estimation unit 42 and the mode compensation unit 44, as described above. In particular, the intraprediction processing unit 46 may determine an intraprediction mode to use to encode a current block. In some examples, intraprediction processing unit 46 may encode a current block using various intraprediction modes, e.g. during separate encoding passes, and intraprediction processing unit 46 (or mode selection unit 40, in some examples ), can select a suitable intraprediction mode to use from the tested modes.
[0131] [0131] For example, the intraprediction processing unit 46 can calculate rate distortion values using a rate distortion analysis for the various intraprediction modes tested, and select the intraprediction mode that has the best rate distortion characteristics among the tested modes. Rate skew analysis usually determines an amount of skew (or error) between an encoded block and an original unencoded block that was encoded to produce the encoded block, as well as a bit rate (i.e. a number of bits) used to produce the coded block. The intraprediction processing unit 46 can calculate the ratios from the skews and rates for the various coded blocks in order to determine which intraprediction mode exhibits the best rate skew value for the block.
[0132] [0132] After selecting an intraprediction mode for a block, the prediction processing unit 46 may provide information indicative of the intraprediction mode selected for the block to the entropy encoding unit 56. The entropy encoding unit 56 can encode information that indicates the selected intraprediction mode. The video encoder may include in the transmitted bitstream configuration data, which may include a plurality of intraprediction mode index tables and a plurality of modified intraprediction mode index tables (also called word mapping tables). - code), definitions of coding contexts for various blocks, and indications of a most likely intraprediction mode, an intraprediction mode index table, and a modified intraprediction mode index table for use for each of the contexts.
[0133] [0133] Video encoder 20 forms a residual video block (eg rl(Cb) for current block Cb) by subtracting the prediction data of the mode selection unit 40 from the original video block that is encoded. The adder 50 represents the component or components that perform this subtraction operation. The transform processing unit 52 applies a transform, such as a discrete cosine transform (DCT) or a conceptually similar transform, to the residual block, producing a video block comprising residual transform coefficient values. The transform processing unit 52 can perform other transforms that are conceptually similar to the DCT. Wavelet transforms, integer transforms, subband transforms or other types of transforms could also be used. In either case, the transform processing unit 52 applies the transform to the residual block, producing a block of residual transform coefficients. The transform can convert the residual information from a pixel value domain into a transform domain, such as a frequency domain. The transform processing unit 52 can send the resulting transform coefficients tCb to the quantization unit 54.
[0134] [0134] As described above, the video encoder 20 can produce residual signal r(Cb) of currently encoded block Cb from samples of currently encoded block s(Cb) and predicted samples p(Cb) (e.g. predicted interprediction samples or intraprediction). The video encoder 20 can perform one or more direct transforms on residual r(Cb) which results in transform coefficients t(Cb). The video encoder can then quantize the transform coefficients t(Cb) before entropy encoding. The quantization unit 54 quantizes the transform coefficients to further reduce the bit rate. The quantization process can reduce the bit depth associated with some or all of the coefficients. The degree of quantization can be modified by adjusting a quantization parameter. In some examples, the quantization unit 54 can perform a scan of the matrix including the quantized transform coefficients. Alternatively, entropy encoding unit 56 may perform the scan.
[0135] [0135] According to the techniques of this disclosure, the quantization unit 54 can be configured to perform a multi-stage quantization process at transform coefficients t(Cb). Figure 9 is a block diagram illustrating an example quantization unit of a video encoder that can implement the techniques of this disclosure.
[0136] [0136] As shown in Figure 9, in a first stage, the QPe determination unit 202 can be configured to derive a quantization parameter offset (deltaQP(s(Cb)) for the current block Cb. In an example, the QPe determination unit 202 can be configured to derive deltaQP(s(Cb) from a lookup table (eg LUT DQP 204). The DQP of LUT 204 includes the deltaQP values and is accessed by an index derived from the mean of samples s(Cb) (for example, luma or chroma samples) from block Cb. The equation below shows an example of deriving a quantization parameter shift: deltaQP(s(Cb)) = LUT DQP (mean(s) (Cb)) 2) when the LUT DQP is the lookup table for deltaQP(s(Cb)) and the mean (s(Cb)) is the average of the sample values of the Cb block.
[0137] [0137] In other examples, the QPe determination unit 202 can be configured to derive the value of deltaQP(s(Cb)) by a function (for example, a second-order function based on variance) of some other characteristic of the encoded block samples, or bit stream characteristics. The QPe determination unit 202 can be configured to determine the deltagcP value using an algorithm, lookup table, or it can explicitly derive the deltaQP value using other means. In some examples, the samples used to determine deltagP( ) may include both luma and chroma samples, or more generally, samples of one or more components of the coded block.
[0138] [0138] The QPe determination unit 202 can then use the variable deltaQP(Cb) to derive the effective quantization parameter QPe, as shown in Equation (1) above. The QPe determining unit 202 can then supply the QPe value to the first quantization unit 206 and the inverse quantization unit 208. In a second stage, the first quantization unit 206 performs a forward quantization on transform coefficients t( Cb) using the derived QPe value. Then, the inverse quantization unit 208 inversely quantizes the quantized transform coefficients using the value QPe, and the inverse transform unit 210 performs an inverse transform (e.g., the transform processing unit inverse transform 52). This results in the residual block r2(Cb) with QPE introduced distortions. An equation for the second stage of the process is shown below: r12(Cb) = InverseTrans( InverseQuant( QPe, ForwardQuant( QPe, t(Cb) ))) (3) where InverseTrans is an inverse transformation process, Inversequant is a inverse quantization process, and ForwardQuant is a forward quantization process.
[0139] [0139] In a third stage, the transform processing unit 212 performs one or more direct transforms (for example, equal to the transform processing unit 52) on the residual r2(Cb). The second quantization unit 214 performs a direct quantization on the transformed residual using the base quantization parameter QPb. This results in quantized transform coefficients tq(Cb), as shown in the equation below:
[0140] [0140] Again with reference to Figure 8, after quantization, the entropy coding unit 56 performs the entropy coding of the quantized transform coefficients ta(Cb). For example, an entropy encoding unit 56 can perform context adaptive variable length encoding (CAVLC), context adaptive binary arithmetic encoding (CABAC), syntax-based context adaptive binary arithmetic (SB AC) encoding, Entropy encoding Interval Partitioning (PIPE) or other entropy encoding technique. In this case of context-based entropy encoding, the context can be based on adjacent blocks. After entropy encoding by entropy encoding unit 56, the encoded bit stream can be transmitted to another device (eg, video decoder 30) or archived for later transmission or retrieval.
[0141] [0141] The inverse quantization unit 58 and the inverse transform processing unit 60 apply the inverse quantization and the inverse transform, respectively, to reconstruct the residual block in the pixel domain, for example, for later use as a reference block. Motion compensation unit 44 can calculate a reference block by adding the residual block to a predictive block of one of the decoded image buffer frames. Motion compensation unit 44 may also apply one or more interpolation filters to the reconstructed residual block to calculate sub-integer pixel values for use in motion estimation. The adder 62 adds the reconstructed residual block to the motion compensation prediction block produced by the motion compensation unit 44 to produce a reconstructed video block for storage in the decoded image buffer 64. The reconstructed video block may be used by the unit. of motion estimation 42 and motion compensation unit 44 as a reference block to intercode a block in a subsequent video frame.
[0142] [0142] Examples of decoder-side processing modalities will be described. On the decoder side, certain encoding tools are dependent on the quantization parameter associated with the QP value used for encoding the current block, or group of blocks. Some non-limiting examples might include: unblocking filters, bilateral filters, looping filters, interpolation filters, codec initialization by entropy, or others.
[0143] [0143] Figure 10 is a block diagram illustrating an example video decoder 30 that can implement the techniques of this disclosure. In the example of Figure 10, the video decoder 30 includes an entropy decoding unit 70, video data memory 71, a motion compensation unit 72, intraprediction processing unit 74, an inverse quantization unit 76, buffer 82, adder 80, QPe estimation unit 84, LUT DQOP 86 and filter unit 88. The video decoder 30 may, in some examples, perform a decoding pass reciprocal to the coding pass described in relation to the video encoder 20 (Figure 8). Motion compensation unit 72 can generate prediction data based on motion vectors received from entropy decoding unit 70, while intraprediction processing unit 74 can generate prediction data based on intraprediction mode indicators received from the unit. entropy decoding 70.
[0144] [0144] Video data memory 71 may store video data, such as an encoded video bit stream, which will be decoded by video decoder components 30. Video data stored in video data memory 71 may be obtained, for example, from computer readable medium 16, for example from a local video source such as a camera, via wired or wireless network communication of video data, or by accessing storage media of physical data. The video data memory 71 may form an encoded picture buffer (CPB) which stores encoded video data from an encoded video bit stream... The DPB 82 may be a reference picture memory which stores video data reference for use in decoding video data by the video decoder 30, for example in intra or intercoding modes. Video data memory 71 and decoded image buffer 82 may be formed by any of a variety of memory devices, such as DRAM, including SDRAM, MRAM, RRAM, or other types of memory devices. Video data memory 71 and decoded image buffer 82 may be provided by the same memory device or separate memory devices. In various examples, video data memory 71 may be on-chip with other video decoder components 30, or off-chip with respect to those components.
[0145] [0145] During the decoding process, the video decoder 30 receives an encoded video bitstream representing the video blocks of an encoded video slice and associated syntax elements from the video encoder 20. The video bitstream of Encoded video may have been encoded by the video encoder 20 using the multi-stage quantization process described above. The encoded video bitstream can also represent video data defined by an HDR and/or WCG color format. The entropy decoding unit 70 of video decoder 30 entropy decodes the bit stream to generate quantized coefficients, motion vectors or intraprediction mode indicators, and other syntax elements. The entropy decoding unit 70 forwards motion vectors and other syntax elements to the motion compensation unit 72. In some examples, the entropy decoding unit 70 may decode a syntax element that indicates a quantization parameter of QPb basis for the video data blocks to be decoded. The video decoder 30 can receive the syntax elements at the video slice level and/or at the video block level.
[0146] [0146] When the video slice is encoded as an intracoded slice (1), the intraprediction processing unit 74 can generate prediction data for a video block of the current video slice based on a signaled intraprediction mode and data of previously decoded blocks of the current frame or image. When the video frame is encoded as an intercoded slice (i.e., B or P), the motion compensation unit 72 produces predictive blocks for a video block of the current video slice based on motion vectors and others. syntax elements received from the entropy decoding unit 70. The predictive blocks may be produced from one of the reference images within one of the reference image lists. The video decoder 30 can build the reference image lists, List O and List 1, using standard construction techniques based on buffered reference images of decoded images.
[0147] [0147] Motion compensation unit 72 can also perform interpolation based on interpolation filters. Motion compensation unit 72 may use interpolation filters as used by video encoder 20 when encoding video blocks to calculate interpolated values for pixels of sub-integer numbers of reference blocks. In that case, the motion compensation unit 72 can determine the interpolation filters used by the video encoder 20 from the received syntax elements and use the interpolation filters to produce predictive blocks.
[0148] [0148] The inverse quantization unit 176 performs the inverse quantization, that is, dequantizes, the quantized transform coefficients provided in the bit stream and decoded by the entropy decoding unit 70. The inverse quantization process may include the use of a base quantization parameter QPb by the video decoder 30 for each video block in the video slice to determine a degree of quantization and, similarly, a degree of inverse quantization that should be applied. The inverse transform processing unit 78 applies an inverse transform, e.g., an inverse DCT, an inverse integer transform, or a conceptually similar inverse transform process, to the transform coefficients in order to produce residual blocks in the pixel domain.
[0149] [0149] After the mode compensation unit 72 generates the prediction block for the current video block based on the motion vectors and other syntax elements, the video decoder 30 forms a decoded video block by summing the blocks residuals from the inverse transform processing unit 78 with the corresponding predictive blocks generated by the mode compensation unit 72. The adder 80 represents the component or components that perform this summing operation.
[0150] [0150] Filter unit 88 can be configured to apply one or more filtering operations to the decoded video data before outputting and buffering the decoded image 82. The block of video decoded into a given frame or image is then , stored in decoded image buffer 82, which stores reference images used for subsequent motion compensation. Decoded image buffer 82 also stores decoded video for later presentation on a display device, such as display device 32 of Figure 11. Examples of filters that can be applied by the filter unit 88 include the deblocking filter, bilateral filters, adaptive loop filters, sample adaptive displacement filters, and others. For example, if desired, an unblocking filter can be applied to filter the decoded blocks to remove blocking artifacts. Other loop filters (in the encoding loop or after the encoding loop) can also be used to smooth pixel transitions or otherwise improve video quality. Decoded image buffer 82 also stores decoded video for later presentation on a display device, such as display device 32 of Figure 1.
[0151] [0151] In some examples, the parameters of a filter applied by filter unit 88 may be based on a quantization parameter. As described above, the video data received by the video decoder 30 includes distortion introduced by the video encoder 20 using the effective quantization parameter QPe, which is greater than the value QPb that is communicated in the bitstream and associated with the current Cb. Filters applied by filter unit 88 may depend on QP parameters provided by the bitstream for performance tuning. Accordingly, the video encoder 30 can be configured to derive an estimate of the actual QPe that was applied by the video encoder 20 to Cb. In that sense, the video decoder 30 may include the QPe estimation unit 84 to derive the QPe value.
[0152] [0152] For example, the QPe estimation unit 84 can be configured to estimate a quantization parameter offset (deltaQP(s(Cb)) for the current block Cb. In one example, the QPe estimation unit 84 can be configured to estimate deltaQP(s(Cb) from a lookup table (eg LUT DQP 86). LUT 86 DQP includes estimates of deltagP values and is accessed by an index derived from the sample mean s(Cb) decoded (eg luma or chroma samples) from Cb block. The equation below shows an example of deriving a quantization parameter shift:
[0153] [0153] In other examples, the QPe estimation unit 84 can be configured to estimate the value of deltaQP(s(Cb)) by a function (for example, a second-order function based on variance) of some other characteristic of the encoded block samples, or bit stream characteristics. The QPe estimation unit 84 can be configured to estimate the deltaQP value using an algorithm, lookup table, or it can explicitly estimate the deltaQP value using other means. In some examples, the samples used to determine deltagQP( ) may include both luma and chroma samples, or more generally, samples of one or more components of the decoded block. A QPe estimation unit 84 may then provide the estimated QPe value to the filter unit 88 for use by one or more encoding tools implemented by the filter unit 88.
[0154] [0154] In one example, filter 88 can be configured to perform unblock filtering. In a non-limiting example of an unblocking implementation, the unblocking process is given below as a change from the HEVC specification to filtering by unblocking. The changes introduced are marked in double underlines:
[0155] [0155] 8.7.2.5.3 Decision process for luma block edges
[0156] [0156] The variables QpQ and QpP are equal to the values QpY EQ QpY EP of the coding units Cbq and Cbp that include the processing blocks containing the sample q0.0 and p0.0, respectively. The QpY EQ and QpY EP is derived as follows: OpY EQ = QpY + deltaçgP (s(Cbg)) (5) QpY EP = QpY + deltagP (s(Cbp)
[0157] [0157] With the variable deltaQP(s(Cb)), the offset is derived from the Lookup Table consisting of deltaQP values and accessed bv index derived from the average s(Cb) of samples.
[0158] [0158] A variable qPL is derived as follows: qPL = (( Qpo + Qpp + 1) » 1)
[0159] [0159] 8.7.2.5.5 Filtering process for chroma block edges
[0160] [0160] The variables QpQ and QpP are equal to the values QpY EQ QpY EP of the coding units that include the coding blocks containing the sample q0.0 and pPO0.0, respectively. The QpY EQ and QpY EP is derived as follows: QpY EQ = Qpy + deltaoP (s(Cbq)) (7) QnY EP = QpY + deltaçP (s(Cbp))
[0161] [0161] With the variable deltaQP(s(Cb)), the offset is derived from the Lookup Table consisting of deltaOP values and accessed bv index derived from the average s(Cb) of samples.
[0162] [0162] If ChromaArrayType is equal to 1, the variable QpC is determined as specified in Table 8 a based on the index qaPi derived as follows: qPi = (( QOpo + QpbP + 1) » 1) + copPicOffíset (
[0163] [0163] In the example above, QpY is equal to QPb and QpY EQ is equal to QPe.
[0164] [0164] In another example, the filter unit 88 can be configured to implement a two-sided filter. The two-sided filter modifies a sample based on a weighted average of the samples in its vicinity, and weights are derived based on the distance of neighboring samples from the current sample and the difference in the sample values of the current sample and neighboring samples.
[0165] [0165] Let x be the location of a current sample value that is filtered, based on the samples in its neighborhood N(x). For each sample d(y) for y belonging to N(x), let w(íy,xX) be the weight associated with the sample at location y to obtain the filtered version of the sample at x. The filtered version of x, D(x) is obtained as
[0166] [0166] Weights are derived as w(yY,%) = fO,x, d(3), dx), QP(Cb)) O)
[0167] [0167] Where f( ) is the function that calculates weights based on sample locations and sample values. The QP used to encode the block containing the samples can also be an additional argument in the derivation of f( ). In some examples, the QP value of the block containing x is used as the argument to f( ). In this example, the value of QP used as an additional argument in f( ) is QPA(Cb), which is derived as follows: QPe(Cb) = QP(Cb) + deltaQP(d(Cb)) (10)
[0168] [0168] Where QP (Cb) is the signaled QP value (for example, QPb) for the coded block, and deltaQP (d(Cb)) is the QP value obtained based on the characteristics of the coded and decoded block, for example average. Thus, the weights derived are the following: w(y,%) = fO,x d(), d(x), QPe(Cb)) a)
[0169] [0169] In some examples, the weighting functions are derived separately for luma and chroma. The QP associated with chroma encoded blocks can also have the effect of chroma shifts that are derived or signaled in the bitstream, and the derived deltaQP() can be a function of samples from one or more components.
[0170] [0170] In some examples, the QP used as an additional argument to f( ) can be obtained by taking into account the derived QPe() value for the coded block containing the sample at position x, and the derived QPe() value for the block encoded containing the sample at position vy. For example, a value derived from the two values QPd( ), for example the mean, can be selected as the argument to f( ).
[0171] [0171] In another example of the revelation, Video Decoder 30 can be configured to use multiple LUT DOPs. In some examples, two or more DQP LUT tables may be available on video decoder 30. Video decoder 30 may be configured to derive an index of one of two or more specific lookup tables that will be used for a block edge specific. The video decoder 30 can be configured to derive the index of syntax elements, from encoding information of blocks in the spatio-temporal neighborhood of the current samples, or from the statistics of decoded image samples.
[0172] [0172] In another example of the disclosure, Video encoder 20 and video decoder 30 can be configured to apply spatially variable quantization with finer block granularities. In some examples, the video encoder 20 may be configured to divide a currently encoded block Cb into subpartitions, each of which is independently processed in accordance with equations 2, 3 and 4 above. Once the reconstructed signal r2 is produced for each of the partitions, they form r2(Cb) data which are further processed as shown in equation (5) above.
[0173] [0173] In the video decoder 30, certain “encoding tools, for example deblocking, are modified to reflect this partitioning, although they are not provided in the CU partitioning. For example, deblocking is designed to filter out these virtual block edges in addition to TU and PU edges, which are currently specified.
[0174] [0174] In some examples, information about the finer granularity of the block partitioning can be signaled in the bitstream syntax elements, e.g. PPS, SPS, slice header, or provided to the decoder as side information.
[0175] [0175] In some examples, constraints (e.g. effected by a clipping process) on the maximum values of QP, including deltaQP or chromaQP offset values, may be removed or extended to support a wider deviation of the QPe parameters of the QOPb using HEVC-like video encoding architectures.
[0176] [0176] The techniques described above this disclosure may provide the following advantages over other techniques. The techniques described above this disclosure can avoid deltaQP signaling, inherently resulting in a few percent bitrate reduction compared to the deltaQP based method of supporting HDR/WCG video data.
[0177] [0177] The techniques described above this disclosure allow equal scaling of all transform coefficients of t(Cb), unlike the techniques in "De-quantization and scaling for next generation containers", J. Zhao, A. Segall, S.-H. Kim, K. Misra (Sharp), JIVET document BOO54, January 2016.
[0178] [0178] The techniques described above this disclosure can provide higher precision estimates of local brightness compared to the techniques in US Patent Application No. 15/595,793, as the decoded values provide a better estimate than predicted samples.
[0179] [0179] The techniques described above this disclosure may allow finer granularity of deriving and applying deltagqPp without an increase in signaling overhead associated with the deltaQP based solution
[0180] [0180] The techniques described above this disclosure have a simpler implementation design compared to the scaling-based designs of "Dequantization and scaling for next generation containers" and US Patent Application No. 15/595,793.
[0181] [0181] Figure 11 is a flowchart illustrating an example encoding method. Video encoder 20, including quantization unit 54, can be configured to perform the techniques of Figure 11.
[0182] [0182] In an example of the development, the video encoder 20 can be configured to determine a base quantization parameter for the block of video data (1100), and to determine a quantization parameter offset for the block of video data. video based on the statistic associated with the video data block (1102). The video encoder 20 can be further configured to add the quantization parameter offset to the base quantization parameter to create an effective quantization parameter (1104), and encode the block of video data using the effective quantization parameter and the base quantization parameter (1106). In one example, the base quantization parameter is the same for all blocks of video data. In one example, the video data sample values are defined by a high dynamic range video data color format.
[0183] [0183] In a further example of the disclosure, to encode the block of video data, the video encoder 20 can be further configured to predict the block to produce residual samples, transform the residual samples to create transform coefficients, quantize the coefficients transform with the effective quantization parameter, inversely quantize the quantized transform coefficients with the effective quantize parameter to produce distorted transform coefficients, inversely transform the distorted transform coefficients to produce distorted residual samples, transform the distorted residual samples, and quantize the distorted residual samples transformed using the base quantization parameter.
[0184] [0184] In another example of the disclosure, to determine the quantization parameter offset, Video encoder 20 can be further configured to determine the quantization parameter offset from a lookup table.
[0185] [0185] Figure 12 is a flowchart illustrating an example decoding method. Video decoder 30, including inverse quantization unit 76, QPe estimation unit 84, and filter unit 88 can be configured to perform the techniques of Fig.
[0186] [0186] In an example of the disclosure, the video decoder 30 can be configured to receive an encoded block of the video data, wherein the encoded block of the video data has been encoded using an effective quantization parameter and a quantization parameter of base, where the effective quantization parameter is a function of a quantization parameter offset added to the base quantization parameter (1200). Video decoder 30 can be further configured to determine the base quantization parameter used to encode the encoded block of video data (1202), and decode the encoded block of video data using the base quantization parameter to create a decoded block of video data (1204). the video decoder 30 can be further configured to determine a quantization parameter offset estimate for the decoded block of video data based on the statistic associated with the decoded block of video data (1206), and add the estimate of the decoded block of video data. quantization parameter to the base quantization parameter to create an estimate of the effective quantization parameter (1208). Video decoder 30 may be further configured to perform one or more filtering operations on the decoded block of video data as a function of the effective quantization parameter estimate (1210). In one example, the base quantization parameter is the same for all blocks of video data. In another example, the video data sample values are defined by a high dynamic range video data color format.
[0187] [0187] In another example of the disclosure, to determine the base quantization parameter, the video decoder 30 can be further configured to receive a base quantization parameter syntax element in an encoded video bitstream, a value of the quantization parameter syntax element indicating the base quantization parameter.
[0188] [0188] In another example of the disclosure, to decode the block of video data, the video decoder 30 can be further configured to entropy decode the coded block of video data to determine the quantized transform coefficients, inversely quantize the quantized transform coefficient using the base quantization parameter to create transform coefficients, inversely transform the transform coefficients to create residual values, and perform a prediction process on the residual values to create the decoded block of video data.
[0189] [0189] In another example of the disclosure, to determine the quantization parameter offset estimate for the decoded block of the video data, the video decoder 30 can be further configured to determine an average of sample values of the decoded block of the video data. video data, and determining the quantization parameter offset estimate for the decoded block of the video data using the average of the sample values of the decoded block of the video data.
[0190] [0190] In another example of the disclosure, to determine the estimate of the quantization parameter offset, the video decoder 30 can be further configured to determine the estimate of the quantization parameter offset from a lookup table, where the average of sample values is an entry in the lookup table.
[0191] [0191] In another example of the disclosure, the video decoder 30 may be further configured to determine the lookup table from a plurality of lookup tables.
[0192] [0192] In another example of the disclosure, to perform one or more filtering operations on the decoded block of video data, the video decoder can be further configured to apply an unblocking filter to the decoded block of video data using the effective quantization parameter.
[0193] [0193] In another example of the disclosure, to perform one or more filtering operations on the decoded block of video data, the video decoder 30 can be further configured to apply a two-way filter to the decoded block of video data using the effective quantization parameter.
[0194] [0194] Certain aspects of this disclosure have been described in relation to HEVC, extensions of the HEVC standard, and examples of JEM and VVC for illustrative purposes. However, the techniques described in this disclosure may be useful for other video encoding processes, including other undeveloped standard or proprietary encoding processes.
[0195] [0195] A video encoder, as described in this disclosure, may refer to a video encoder or a video decoder. Similarly, a video encoding unit can refer to a video encoder or a video decoder. Similarly, video encoding may refer to video encoding or video decoding, as applicable.
[0196] [0196] It should be recognized that, depending on the example, certain actions or events of any of the techniques “described in this document may be executed in a different sequence, may be added, merged or ignored (e.g. not all actions or events described are necessary to practice the techniques). Also, in certain instances, actions or events may be executed simultaneously, for example through multi-threaded processing, interrupt processing, or multiple processors, rather than sequentially.
[0197] [0197] In one or more examples, the functions described can be implemented in hardware, software,
[0198] [0198] By way of example, and not limitation, such computer readable storage media may comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage, or other magnetic storage devices, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if instructions are transmitted from a website, server, or other remote source using coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and micro- waves, so coaxial cable, fiber optic cable, twisted pair, DSL or wireless technologies such as infrared, radio and microwave are included in the definition of medium. It should be understood, however, that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other temporary media, but are instead directed to tangible, non-temporary storage media. Disc and floppy disk, as used herein, includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and Blu-ray disc on which floppy disks reproduce data magnetically, while discs play data optically with lasers. Combinations of these may also be included within the scope of computer readable media.
[0199] [0199] Instructions may be executed by one or more processors, such as one or more DSPs, general purpose microprocessors, ASICs, FPGAs, or other equivalent discrete or integrated logic circuitry. Accordingly, the term "processor" as used herein may refer to any prior framework or any other framework suitable for implementing the techniques described herein. Furthermore, in some examples, the functionality described in this document may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated into a combined codec. Also, the techniques could be fully implemented in one or more circuits or logic elements.
[0200] [0200] The techniques of this disclosure can be implemented in a wide variety of devices or appliances, including a wireless device, an integrated circuit (IC), or a set of ICs (eg, a chip set). Various components, modules or units are described in this disclosure to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily require realization by different hardware units. Instead, as described above, multiple units may be combined into one codec hardware unit Or provided by a collection of interoperable hardware units, including one or more processors as described above, together with appropriate software and/or firmware.
[0201] [0201] Several examples have been described. These and other examples are within the scope of the claims that follow.

权利要求:
Claims (34)
[1]
1. Video data decoding method, the method comprising: receiving an encoded block of video data, wherein the encoded block of video data has been encoded using an effective quantization parameter and a base quantization parameter, wherein the effective quantization parameter is a function of a quantization parameter offset added to the base quantization parameter; determining the base quantization parameter used to encode the encoded block of video data; decoding the encoded block of video data using the base quantization parameter to create a decoded block of video data; determining a quantization parameter offset estimate for the decoded block of video data based on the statistic associated with the decoded block of video data; adding the quantization parameter offset estimate to the base quantization parameter to create an effective quantization parameter estimate; and performing one or more filtering operations on the decoded block of video data as a function of the effective quantization parameter estimation.
[2]
A method as claimed in claim 1, wherein the base quantization parameter is the same for all blocks of video data.
[3]
The method of claim 1, wherein the sample values of the video data are defined by a high dynamic range video data color format.
[4]
The method of claim 1, wherein determining the base quantization parameter comprises: receiving a base quantization parameter syntax element in an encoded video bitstream, a value of the base quantization parameter syntax element in an encoded video bitstream, quantization parameter indicating the base quantization parameter.
[5]
The method of claim 1, wherein decoding the encoded block of video data comprises: entropy decoding the encoded block of video data to determine quantized transform coefficients; inversely quantizing the quantized transform coefficient using the base quantization parameter to create transform coefficients; inversely transform the transform coefficients to create residual values; and perform a prediction process on the residual values to create the decoded block of video data.
[6]
The method of claim 1, wherein determining the quantization parameter shift estimate for the decoded block of video data comprises: determining an average of sample values of the decoded block of video data; and determining the estimate of the quantization parameter offset for the decoded block of the video data using the average of the sample values of the decoded block of the video data.
[7]
The method of claim 6, wherein determining the estimate of the quantization parameter shift comprises: determining the estimate of the quantization parameter shift from a lookup table, wherein the average of the sample values is an entry in the lookup table.
[8]
A method according to claim 7, further comprising: determining the lookup table from a plurality of lookup tables.
[9]
The method of claim 1, wherein performing one or more filtering operations on the decoded block of video data comprises: applying an unblocking filter to the decoded block of video data using the effective quantization parameter.
[10]
The method of claim 1, wherein performing one or more filtering operations on the decoded block of video data comprises: applying a two-way filter to the decoded block of video data using the effective quantization parameter.
[11]
11. Video data encoding method, the method comprising: determining a base quantization parameter for a block of video data;
determining a quantization parameter offset for the block of video data based on the statistic associated with the block of video data; adding the quantization parameter offset to the base quantization parameter to create an effective quantization parameter; and encode the block of video data using the effective quantization parameter and the base quantization parameter.
[12]
The method of claim 11, wherein the base quantization parameter is the same for all blocks of video data.
[13]
The method of claim 11, wherein the video data sample values are defined by a high dynamic range video data color format.
[14]
The method of claim 11, wherein encoding the block of video data comprises: predicting the block to produce residual samples; transforming residual samples to create transform coefficients; quantize the transform coefficients with the effective quantization parameter; Inversely quantize the quantized transform coefficients with the effective quantization parameter to produce distorted transform coefficients; inversely transforming the skewed transform coefficients to produce skewed residual samples;
transform the distorted residual samples; and quantize the transformed distorted residual samples using the base quantization parameter.
[15]
The method of claim 11, wherein determining the quantization parameter offset comprises determining the quantization parameter offset from a lookup table.
[16]
16. Apparatus configured to decode video data, the apparatus comprising: a memory configured to store an encoded block of video data; and one or more processors communicating with the memory, the one or more processors being configured to: receive the encoded block of video data, wherein the encoded block of video data has been encoded using an effective quantization parameter and a base quantization parameter, wherein the effective quantization parameter is a function of a quantization parameter offset added to the base quantization parameter; determining the base quantization parameter used to encode the encoded block of video data; decoding the encoded block of video data using the base quantization parameter to create a decoded block of video data; determining a quantization parameter offset estimate for the decoded block of video data based on the statistic associated with the decoded block of video data; adding the quantization parameter offset estimate to the base quantization parameter to create an effective quantization parameter estimate; and performing one or more filtering operations on the decoded block of video data as a function of the effective quantization parameter estimation.
[17]
An apparatus as claimed in claim 16, wherein the base quantization parameter is the same for all blocks of the video data.
[18]
An apparatus as claimed in claim 16, wherein the video data sample values are defined by a high dynamic range video data color format.
[19]
An apparatus according to claim 16, wherein to determine the base quantization parameter, the one or more processors are further configured to: receive a base quantization parameter syntax element in a video bitstream encoded, a value of the quantization parameter syntax element indicating the base quantization parameter.
[20]
An apparatus according to claim 16, wherein to decode the block of video data, the one or more processors are further configured to: entropy decode the coded block of video data to determine the quantized transform coefficients; inversely quantizing the quantized transform coefficient using the base quantization parameter to create transform coefficients; inversely transform the transform coefficients to create residual values; and perform a prediction process on the residual values to create the decoded block of video data.
[21]
An apparatus according to claim 16, wherein to determine the estimate of the quantization parameter offset for the block of video data, the one or more processors are further configured to: determine an average of sample values of the block decoded video data; and determining the estimate of the quantization parameter offset for the decoded block of the video data using the average of the sample values of the decoded block of the video data.
[22]
An apparatus according to claim 21, wherein to determine the estimate of the quantization parameter offset, the one or more processors are further configured to: determine the estimate of the quantization parameter offset from a lookup table , where the average of the sample values is an entry in the lookup table.
[23]
An apparatus as claimed in claim 21, wherein the one or more processors are further configured to: determine the lookup table from a plurality of lookup tables.
[24]
Apparatus as claimed in claim 16, wherein to perform one or more filtering operations on the decoded block of video data, the one or more processors are further configured to: apply an unblocking filter to the decoded block of data video using the effective quantization parameter.
[25]
Apparatus according to claim 16, wherein to perform the one or more operations of filtering the decoded block of video data, the one or more processors are further configured to: apply a two-way filter to the decoded block of video data; video using the effective quantization parameter.
[26]
26. Apparatus configured to encode video data, the apparatus comprising: a memory configured to store a block of video data; and one or more processors communicating with the memory, the one or more processors being configured to: determine a base quantization parameter for the block of video data; determining a quantization parameter offset for the block of video data based on the statistic associated with the block of video data; adding the quantization parameter offset to the base quantization parameter to create an effective quantization parameter; and encoding the block of video data using the effective quantization parameter and the base quantization parameter.
[27]
An apparatus as claimed in claim 26, wherein the base quantization parameter is the same for all blocks of the video data.
[28]
An apparatus as claimed in claim 26, wherein the video data sample values are defined by a high dynamic range video data color format.
[29]
Apparatus as claimed in claim 26, wherein to encode the block of video data, the one or more processors are further configured to: predict the block to produce residual samples; transforming residual samples to create transform coefficients; quantize the transform coefficients with the effective quantization parameter; inversely quantizing the quantized transform coefficients with the effective quantization parameter to produce distorted transform coefficients; inversely transform the skewed transform coefficients to produce skewed residual samples; transform the skewed residual samples; and quantize the skewed residual samples transformed using the base quantization parameter.
[30]
The apparatus of claim 26, wherein to determine the quantization parameter offset, the one or more processors are further configured to determine the quantization parameter offset from a lookup table.
[31]
31. Apparatus configured to decode video data, the apparatus comprising: means for receiving an encoded block of video data, wherein the encoded block of video data has been encoded using an effective quantization parameter and a quantization parameter of base, wherein the effective quantization parameter is a function of a quantization parameter offset added to the base quantization parameter; means for determining the base quantization parameter used to encode the encoded block of video data; means for decoding the encoded block of video data using the base quantization parameter to create a decoded block of video data; means for determining a quantization parameter offset estimate for the decoded block of video data based on the statistic associated with the decoded block of video data; means for adding the quantization parameter offset estimate to the base quantization parameter to create an effective quantization parameter estimate; and means for performing one or more filtering operations on the decoded block of video data as a function of the effective quantization parameter estimation.
[32]
32. Device configured to encode video data, the device comprising:
means for determining a base quantization parameter for a block of the video data; means for determining a quantization parameter offset for the block of video data based on the statistic associated with the block of video data; means for adding the quantization parameter offset to the base quantization parameter to create an effective quantization parameter; and means for encoding the block of video data using the effective quantization parameter and the base quantization parameter.
[33]
33. Non-temporary computer-readable storage medium that stores instructions that, when executed, cause one or more processors to: receive an encoded block of video data, wherein the encoded block of video data has been encoded using an effective quantization parameter and a base quantization parameter, wherein the effective quantization parameter is a function of a quantization parameter offset added to the base quantization parameter; determine the base quantization parameter used to encode the encoded block of video data; decode the encoded block of video data using the base quantization parameter to create a decoded block of video data; determine an estimate of the quantization parameter offset for the decoded block of video data based on the statistic associated with the decoded block of video data; add the quantization parameter offset estimate to the base quantization parameter to create an effective quantization parameter estimate; and performing one or more filtering operations on the decoded block of video data as a function of the effective quantization parameter estimate.
[34]
34. Non-temporary computer-readable storage medium that stores instructions that, when executed, cause one or more processors to: determine a base quantization parameter for a block of video data; determine a quantization parameter offset for the block of video data based on the statistic associated with the block of video data; add the quantization parameter offset to the base quantization parameter to create an effective quantization parameter; and encode the video data block using the effective quantization parameter and the base quantization parameter.

类似技术:

公开号 | 公开日 | 专利标题

BR112020006985A2|2020-10-06|video encoding with spatially variable quantization adaptable to content

US11228770B2|2022-01-18|Loop sample processing for high dynamic range and wide color gamut video coding

US10778978B2|2020-09-15|System and method of cross-component dynamic range adjustment | in video coding

JP2018515018A|2018-06-07|Dynamic range adjustment for high dynamic range and wide color gamut video coding

KR20180016383A|2018-02-14|Content-adaptive application of fixed transfer function to high dynamic range | and/or wide color gamut | video data

US11190779B2|2021-11-30|Quantization parameter control for video coding with joined pixel/transform based quantization

EP3304911A1|2018-04-11|Processing high dynamic range and wide color gamut video data for video coding

BR112020013979A2|2020-12-08|SIGNALING MECHANISMS FOR EQUAL TRACKS AND OTHER DRA PARAMETERS FOR VIDEO ENCODING

KR20180016379A|2018-02-14|Adaptive constant-luminance approach for high dynamic range and wide color gamut video coding

JP2019528017A|2019-10-03|Video coding tool for in-loop sample processing

WO2020224545A1|2020-11-12|An encoder, a decoder and corresponding methods using an adaptive loop filter

同族专利:

公开号 | 公开日

WO2019075060A1|2019-04-18|

US20190116361A1|2019-04-18|

EP3695601A1|2020-08-19|

SG11202001990YA|2020-04-29|

CN111194551A|2020-05-22|

TW201924336A|2019-06-16|

US11095896B2|2021-08-17|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

JP4063508B2|2001-07-04|2008-03-19|日本電気株式会社|Bit rate conversion device and bit rate conversion method|

US7092448B2|2002-05-24|2006-08-15|Koninklijke Philips Electronics N.V.|Method and system for estimating no-reference objective quality of video data|

US20050286629A1|2004-06-25|2005-12-29|Adriana Dumitras|Coding of scene cuts in video sequences using non-reference frames|

US7889790B2|2005-12-20|2011-02-15|Sharp Laboratories Of America, Inc.|Method and apparatus for dynamically adjusting quantization offset values|

WO2009050897A1|2007-10-19|2009-04-23|Panasonic Corporation|Encoding rate conversion device, encoding rate conversion method, and integrated circuit|

ES2741426T3|2010-10-20|2020-02-11|Guangdong Oppo Mobile Telecommunications Corp Ltd|Optimization of the error-resistant distortion rate for encoding images and videos|

WO2014193630A1|2013-05-30|2014-12-04|Intel Corporation|Quantization offset and cost factor modification for video encoding|

CN105338352B|2014-07-24|2019-04-19|华为技术有限公司|Adaptive quantification method and device in a kind of Video coding|

US20160050442A1|2014-08-15|2016-02-18|Samsung Electronics Co., Ltd.|In-loop filtering in video coding|

US11228770B2|2016-05-16|2022-01-18|Qualcomm Incorporated|Loop sample processing for high dynamic range and wide color gamut video coding|US10129558B2|2015-09-21|2018-11-13|Qualcomm Incorporated|Supplement enhancement informationmessages for high dynamic range and wide color gamut video coding|

US10244249B2|2015-09-21|2019-03-26|Qualcomm Incorporated|Fixed point implementation of range adjustment of components in video coding|

EP3306922A1|2016-10-05|2018-04-11|Thomson Licensing|Method and apparatus for encoding a picture using rate-distortion based block splitting|

JP6986868B2|2017-06-19|2021-12-22|キヤノン株式会社|Image coding device, image decoding device, image coding method, image decoding method, program|

WO2021126061A1|2019-12-17|2021-06-24|Telefonaktiebolaget Lm Ericsson |Low complexity image filter|

WO2021158047A1|2020-02-05|2021-08-12|엘지전자 주식회사|Image decoding method using image information comprising tsrc available flag and apparatus therefor|

WO2021158048A1|2020-02-05|2021-08-12|엘지전자 주식회사|Image decoding method related to signaling of flag indicating whether tsrc is available, and device therefor|

WO2021158049A1|2020-02-05|2021-08-12|엘지전자 주식회사|Method for image decoding for image information coding, and device therefor|

WO2021159081A1|2020-02-07|2021-08-12|Beijing Dajia Internet Informationtechnology Co., Ltd.|Lossless coding modes for video coding|

CN111901594B|2020-06-29|2021-07-20|北京大学|Visual analysis task-oriented image coding method, electronic device and medium|

US11032546B1|2020-07-20|2021-06-08|Tencent America LLC|Quantizer for lossless and near-lossless compression|

CN112968728A|2021-02-23|2021-06-15|中山大学|Bidirectional intersatellite laser interference link establishing method and system based on QPDnondestructive phase measurement|

法律状态:
2021-11-23| B350| Update of information on the portal [chapter 15.35 patent gazette]|

优先权:

申请号 | 申请日 | 专利标题

US201762571732P| true| 2017-10-12|2017-10-12|

US62/571,732|2017-10-12|

US16/155,344|US11095896B2|2017-10-12|2018-10-09|Video coding with content adaptive spatially varying quantization|

US16/155,344|2018-10-09|

PCT/US2018/055211|WO2019075060A1|2017-10-12|2018-10-10|Video coding with content adaptive spatially varying quantization|

[返回顶部]